fbpx
Wikipedia

Count sketch

Count sketch is a type of dimensionality reduction that is particularly efficient in statistics, machine learning and algorithms.[1][2] It was invented by Moses Charikar, Kevin Chen and Martin Farach-Colton[3] in an effort to speed up the AMS Sketch by Alon, Matias and Szegedy for approximating the frequency moments of streams[4] (these calculations require counting of the number of occurrences for the distinct elements of the stream).

The sketch is nearly identical[citation needed] to the Feature hashing algorithm by John Moody,[5] but differs in its use of hash functions with low dependence, which makes it more practical. In order to still have a high probability of success, the median trick is used to aggregate multiple count sketches, rather than the mean.

These properties allow use for explicit kernel methods, bilinear pooling in neural networks and is a cornerstone in many numerical linear algebra algorithms.[6]

Intuitive explanation edit

The inventors of this data structure offer the following iterative explanation of its operation:[3]

  • at the simplest level, the output of a single hash function s mapping stream elements q into {+1, -1} is feeding a single up/down counter C. After a single pass over the data, the frequency   of a stream element q can be approximated, although extremely poorly, by the expected value  ;
  • a straightforward way to improve the variance of the previous estimate is to use an array of different hash functions  , each connected to its own counter  . For each element q, the   still holds, so averaging across the i range will tighten the approximation;
  • the previous construct still has a major deficiency: if a lower-frequency-but-still-important output element a exhibits a hash collision with a high-frequency element,   estimate can be significantly affected. Avoiding this requires reducing the frequency of collision counter updates between any two distinct elements. This is achieved by replacing each   in the previous construct with an array of m counters (making the counter set into a two-dimensional matrix  ), with index j of a particular counter to be incremented/decremented selected via another set of hash functions   that map element q into the range {1..m}. Since  , averaging across all values of i will work.

Mathematical definition edit

1. For constants   and   (to be defined later) independently choose   random hash functions   and   such that   and  . It is necessary that the hash families from which   and   are chosen be pairwise independent.

2. For each item   in the stream, add   to the  th bucket of the  th hash.

At the end of this process, one has   sums   where

 

To estimate the count of  s one computes the following value:

 

The values   are unbiased estimates of how many times   has appeared in the stream.

The estimate   has variance  , where   is the length of the stream and   is  .[7]

Furthermore,   is guaranteed to never be more than   off from the true value, with probability  .

Vector formulation edit

Alternatively Count-Sketch can be seen as a linear mapping with a non-linear reconstruction function. Let  , be a collection of   matrices, defined by

 

for   and 0 everywhere else.

Then a vector   is sketched by  . To reconstruct   we take  . This gives the same guarantees as stated above, if we take   and  .

Relation to Tensor sketch edit

The count sketch projection of the outer product of two vectors is equivalent to the convolution of two component count sketches.

The count sketch computes a vector convolution

 , where   and   are independent count sketch matrices.

Pham and Pagh[8] show that this equals   – a count sketch   of the outer product of vectors, where   denotes Kronecker product.

The fast Fourier transform can be used to do fast convolution of count sketches. By using the face-splitting product[9][10][11] such structures can be computed much faster than normal matrices.

See also edit

References edit

  1. ^ Faisal M. Algashaam; Kien Nguyen; Mohamed Alkanhal; Vinod Chandran; Wageeh Boles. "Multispectral Periocular Classification WithMultimodal Compact Multi-Linear Pooling" [1]. IEEE Access, Vol. 5. 2017.
  2. ^ Ahle, Thomas; Knudsen, Jakob (2019-09-03). "Almost Optimal Tensor Sketch". ResearchGate. Retrieved 2020-07-11.
  3. ^ a b Charikar, Chen & Farach-Colton 2004.
  4. ^ Alon, Noga, Yossi Matias, and Mario Szegedy. "The space complexity of approximating the frequency moments." Journal of Computer and system sciences 58.1 (1999): 137-147.
  5. ^ Moody, John. "Fast learning in multi-resolution hierarchies." Advances in neural information processing systems. 1989.
  6. ^ Woodruff, David P. "Sketching as a Tool for Numerical Linear Algebra." Theoretical Computer Science 10.1-2 (2014): 1–157.
  7. ^ Larsen, Kasper Green, Rasmus Pagh, and Jakub Tětek. "CountSketches, Feature Hashing and the Median of Three." International Conference on Machine Learning. PMLR, 2021.
  8. ^ Ninh, Pham; Pagh, Rasmus (2013). Fast and scalable polynomial kernels via explicit feature maps. SIGKDD international conference on Knowledge discovery and data mining. Association for Computing Machinery. doi:10.1145/2487575.2487591.
  9. ^ Slyusar, V. I. (1998). "End products in matrices in radar applications" (PDF). Radioelectronics and Communications Systems. 41 (3): 50–53.
  10. ^ Slyusar, V. I. (1997-05-20). "Analytical model of the digital antenna array on a basis of face-splitting matrix products" (PDF). Proc. ICATT-97, Kyiv: 108–109.
  11. ^ Slyusar, V. I. (March 13, 1998). "A Family of Face Products of Matrices and its Properties" (PDF). Cybernetics and Systems Analysis C/C of Kibernetika I Sistemnyi Analiz.- 1999. 35 (3): 379–384. doi:10.1007/BF02733426. S2CID 119661450.

Further reading edit

  • Charikar, Moses; Chen, Kevin; Farach-Colton, Martin (2004). "Finding frequent items in data streams" (PDF). Theoretical Computer Science. 312 (1). Elsevier BV: 3–15. doi:10.1016/s0304-3975(03)00400-6. ISSN 0304-3975.
  • Faisal M. Algashaam; Kien Nguyen; Mohamed Alkanhal; Vinod Chandran; Wageeh Boles. "Multispectral Periocular Classification WithMultimodal Compact Multi-Linear Pooling" [1]. IEEE Access, Vol. 5. 2017.
  • Ahle, Thomas; Knudsen, Jakob (2019-09-03). "Almost Optimal Tensor Sketch". ResearchGate. Retrieved 2020-07-11.

count, sketch, type, dimensionality, reduction, that, particularly, efficient, statistics, machine, learning, algorithms, invented, moses, charikar, kevin, chen, martin, farach, colton, effort, speed, sketch, alon, matias, szegedy, approximating, frequency, mo. Count sketch is a type of dimensionality reduction that is particularly efficient in statistics machine learning and algorithms 1 2 It was invented by Moses Charikar Kevin Chen and Martin Farach Colton 3 in an effort to speed up the AMS Sketch by Alon Matias and Szegedy for approximating the frequency moments of streams 4 these calculations require counting of the number of occurrences for the distinct elements of the stream The sketch is nearly identical citation needed to the Feature hashing algorithm by John Moody 5 but differs in its use of hash functions with low dependence which makes it more practical In order to still have a high probability of success the median trick is used to aggregate multiple count sketches rather than the mean These properties allow use for explicit kernel methods bilinear pooling in neural networks and is a cornerstone in many numerical linear algebra algorithms 6 Contents 1 Intuitive explanation 2 Mathematical definition 2 1 Vector formulation 3 Relation to Tensor sketch 4 See also 5 References 6 Further readingIntuitive explanation editThe inventors of this data structure offer the following iterative explanation of its operation 3 at the simplest level the output of a single hash function s mapping stream elements q into 1 1 is feeding a single up down counter C After a single pass over the data the frequency n q displaystyle n q nbsp of a stream element q can be approximated although extremely poorly by the expected value E C s q displaystyle mathbf E C cdot s q nbsp a straightforward way to improve the variance of the previous estimate is to use an array of different hash functions si displaystyle s i nbsp each connected to its own counter Ci displaystyle C i nbsp For each element q the E Ci si q n q displaystyle mathbf E C i cdot s i q n q nbsp still holds so averaging across the i range will tighten the approximation the previous construct still has a major deficiency if a lower frequency but still important output element a exhibits a hash collision with a high frequency element n a displaystyle n a nbsp estimate can be significantly affected Avoiding this requires reducing the frequency of collision counter updates between any two distinct elements This is achieved by replacing each Ci displaystyle C i nbsp in the previous construct with an array of m counters making the counter set into a two dimensional matrix Ci j displaystyle C i j nbsp with index j of a particular counter to be incremented decremented selected via another set of hash functions hi displaystyle h i nbsp that map element q into the range 1 m Since E Ci hi q si q n q displaystyle mathbf E C i h i q cdot s i q n q nbsp averaging across all values of i will work Mathematical definition edit1 For constants w displaystyle w nbsp and t displaystyle t nbsp to be defined later independently choose d 2t 1 displaystyle d 2t 1 nbsp random hash functions h1 hd displaystyle h 1 dots h d nbsp and s1 sd displaystyle s 1 dots s d nbsp such that hi n w displaystyle h i n to w nbsp and si n 1 displaystyle s i n to pm 1 nbsp It is necessary that the hash families from which hi displaystyle h i nbsp and si displaystyle s i nbsp are chosen be pairwise independent 2 For each item qi displaystyle q i nbsp in the stream add sj qi displaystyle s j q i nbsp to the hj qi displaystyle h j q i nbsp th bucket of the j displaystyle j nbsp th hash At the end of this process one has wd displaystyle wd nbsp sums Cij displaystyle C ij nbsp where Ci j hi k jsi k displaystyle C i j sum h i k j s i k nbsp To estimate the count of q displaystyle q nbsp s one computes the following value rq mediani 1dsi q Ci hi q displaystyle r q text median i 1 d s i q cdot C i h i q nbsp The values si q Ci hi q displaystyle s i q cdot C i h i q nbsp are unbiased estimates of how many times q displaystyle q nbsp has appeared in the stream The estimate rq displaystyle r q nbsp has variance O min m12 w2 m22 w displaystyle O mathrm min m 1 2 w 2 m 2 2 w nbsp where m1 displaystyle m 1 nbsp is the length of the stream and m22 displaystyle m 2 2 nbsp is q i qi q 2 displaystyle sum q sum i q i q 2 nbsp 7 Furthermore rq displaystyle r q nbsp is guaranteed to never be more than 2m2 w displaystyle 2m 2 sqrt w nbsp off from the true value with probability 1 e O t displaystyle 1 e O t nbsp Vector formulation edit Alternatively Count Sketch can be seen as a linear mapping with a non linear reconstruction function Let M i d 1 0 1 w n displaystyle M i in d in 1 0 1 w times n nbsp be a collection of d 2t 1 displaystyle d 2t 1 nbsp matrices defined by Mhi j j i si j displaystyle M h i j j i s i j nbsp for j w displaystyle j in w nbsp and 0 everywhere else Then a vector v Rn displaystyle v in mathbb R n nbsp is sketched by C i M i v Rw displaystyle C i M i v in mathbb R w nbsp To reconstruct v displaystyle v nbsp we take vj medianiCj i si j displaystyle v j text median i C j i s i j nbsp This gives the same guarantees as stated above if we take m1 v 1 displaystyle m 1 v 1 nbsp and m2 v 2 displaystyle m 2 v 2 nbsp Relation to Tensor sketch editThe count sketch projection of the outer product of two vectors is equivalent to the convolution of two component count sketches The count sketch computes a vector convolutionC 1 x C 2 xT displaystyle C 1 x ast C 2 x T nbsp where C 1 displaystyle C 1 nbsp and C 2 displaystyle C 2 nbsp are independent count sketch matrices Pham and Pagh 8 show that this equals C x xT displaystyle C x otimes x T nbsp a count sketch C displaystyle C nbsp of the outer product of vectors where displaystyle otimes nbsp denotes Kronecker product The fast Fourier transform can be used to do fast convolution of count sketches By using the face splitting product 9 10 11 such structures can be computed much faster than normal matrices See also editCount min sketch is a version of algorithm with smaller memory requirements and weaker error guarantees as a tradeoff TensorsketchReferences edit Faisal M Algashaam Kien Nguyen Mohamed Alkanhal Vinod Chandran Wageeh Boles Multispectral Periocular Classification WithMultimodal Compact Multi Linear Pooling 1 IEEE Access Vol 5 2017 Ahle Thomas Knudsen Jakob 2019 09 03 Almost Optimal Tensor Sketch ResearchGate Retrieved 2020 07 11 a b Charikar Chen amp Farach Colton 2004 Alon Noga Yossi Matias and Mario Szegedy The space complexity of approximating the frequency moments Journal of Computer and system sciences 58 1 1999 137 147 Moody John Fast learning in multi resolution hierarchies Advances in neural information processing systems 1989 Woodruff David P Sketching as a Tool for Numerical Linear Algebra Theoretical Computer Science 10 1 2 2014 1 157 Larsen Kasper Green Rasmus Pagh and Jakub Tetek CountSketches Feature Hashing and the Median of Three International Conference on Machine Learning PMLR 2021 Ninh Pham Pagh Rasmus 2013 Fast and scalable polynomial kernels via explicit feature maps SIGKDD international conference on Knowledge discovery and data mining Association for Computing Machinery doi 10 1145 2487575 2487591 Slyusar V I 1998 End products in matrices in radar applications PDF Radioelectronics and Communications Systems 41 3 50 53 Slyusar V I 1997 05 20 Analytical model of the digital antenna array on a basis of face splitting matrix products PDF Proc ICATT 97 Kyiv 108 109 Slyusar V I March 13 1998 A Family of Face Products of Matrices and its Properties PDF Cybernetics and Systems Analysis C C of Kibernetika I Sistemnyi Analiz 1999 35 3 379 384 doi 10 1007 BF02733426 S2CID 119661450 Further reading editCharikar Moses Chen Kevin Farach Colton Martin 2004 Finding frequent items in data streams PDF Theoretical Computer Science 312 1 Elsevier BV 3 15 doi 10 1016 s0304 3975 03 00400 6 ISSN 0304 3975 Faisal M Algashaam Kien Nguyen Mohamed Alkanhal Vinod Chandran Wageeh Boles Multispectral Periocular Classification WithMultimodal Compact Multi Linear Pooling 1 IEEE Access Vol 5 2017 Ahle Thomas Knudsen Jakob 2019 09 03 Almost Optimal Tensor Sketch ResearchGate Retrieved 2020 07 11 Retrieved from https en wikipedia org w index php title Count sketch amp oldid 1175703781, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.