fbpx
Wikipedia

Cuckoo filter

A cuckoo filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set, like a Bloom filter does. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". A cuckoo filter can also delete existing items, which is not supported by Bloom filters. In addition, for applications that store many items and target moderately low false positive rates, cuckoo filters can achieve lower space overhead than space-optimized Bloom filters.[1]

Cuckoo filters were first described in 2014.[2]

Algorithm description

A cuckoo filter uses a hash table based on cuckoo hashing to store the fingerprints of items.[2] The data structure is broken into buckets of some size  . To insert the fingerprint of an item  , one first computes two potential buckets   and   where   could go. These buckets are calculated using the formula

 
 

Note that, due to the symmetry of the XOR operation, one can compute   from  , and   from  . As defined above,  ; it follows that  . These properties are what make it possible to store the fingerprints with cuckoo hashing.

The fingerprint of   is placed into one of buckets   and  . If the buckets are full, then one of the fingerprints in the bucket is evicted using cuckoo hashing, and placed into the other bucket where it can go. If that bucket, in turn, is also full, then that may trigger another eviction, etc.

The hash table can achieve both high utilization (thanks to cuckoo hashing), and compactness because only fingerprints are stored. Lookup and delete operations of a cuckoo filter are straightforward.[2]

There are a maximum of two buckets to check by   and  . If found, the appropriate lookup or delete operation can be performed in   time. Often, in practice,   is a constant.

In order for the hash table to offer theoretical guarantees, the fingerprint size   must be at least   bits.[2][3][4] Subject to this constraint, cuckoo filters guarantee a false-positive rate of at most  .[2]

Comparison to Bloom filters

A cuckoo filter is similar to a Bloom filter in that they both are fast and compact, and they may both return false positives as answers to set-membership queries:

  • Space-optimal Bloom filters use   bits of space per inserted key, where   is the false positive rate. A cuckoo filter requires   space per key[2] where   is the hash table load factor, which can be   based on the cuckoo filter's setting. Note that the information theoretical lower bound requires   bits for each item. Both bloom filters and cuckoo filters with low load can be compressed when not in use.
  • On a positive lookup, a space-optimal Bloom filter requires a constant   memory accesses into the bit array, whereas a cuckoo filter requires at most   memory accesses, which can be a constant in practice.
  • Cuckoo filters have degraded insertion speed after reaching a load threshold, when table expanding is recommended. In contrast, Bloom filters can keep inserting new items at the cost of a higher false positive rate before expansion.
  • Bloom filters offer fast union and approximate intersection operations using cheap bitwise operations, which can also be applied to compressed bloom filters if streaming compression is used.

Limitations

  • A cuckoo filter can only delete items that are known to be inserted before.
  • Insertion can fail and rehashing is required like other cuckoo hash tables. Note that the amortized insertion complexity is still  .[5]
  • Cuckoo filters require a fingerprint size   of at least   bits. This means that the space per key must be at least   bits, even if   is large. In practice,   is chosen to be large enough that this is not a major issue.[2]

References

  1. ^ Michael D. Mitzenmacher. "Bloom Filters, Cuckoo Hashing, Cuckoo Filters, Adaptive Cuckoo Filters, and Learned Bloom Filters".
  2. ^ a b c d e f g Fan, Bin; Andersen, Dave G.; Kaminsky, Michael; Mitzenmacher, Michael D. (2014). Cuckoo filter: Practically better than Bloom. Proc. 10th ACM International on Conference on Emerging Networking Experiments and Technologies (CoNEXT '14). Sydney, Australia. pp. 75–88. doi:10.1145/2674005.2674994. ISBN 9781450332798.
  3. ^ Eppstein, David (22 June 2016). Cuckoo filter: Simplification and analysis. Proc. 15th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2016). Leibniz International Proceedings in Informatics (LIPIcs). Vol. 53. Reykjavik, Iceland. pp. 8:1–8:12. arXiv:1604.06067. doi:10.4230/LIPIcs.SWAT.2016.8.
  4. ^ Fleming, Noah (17 May 2018). Cuckoo Hashing and Cuckoo Filters (PDF) (Technical report). University of Toronto.
  5. ^ Pagh, Rasmus; Rodler, Flemming Friche (2001). "Cuckoo hashing". Proc. 9th Annual European Symposium on Algorithms (ESA 2001). Lecture Notes in Computer Science. Vol. 2161. Århus, Denmark. pp. 121–133. doi:10.1007/3-540-44676-1_10. ISBN 978-3-540-42493-2.

External links

  • Probabilistic Filters By Example – A tutorial comparing Cuckoo and Bloom filters.

cuckoo, filter, cuckoo, filter, space, efficient, probabilistic, data, structure, that, used, test, whether, element, member, like, bloom, filter, does, false, positive, matches, possible, false, negatives, other, words, query, returns, either, possibly, defin. A cuckoo filter is a space efficient probabilistic data structure that is used to test whether an element is a member of a set like a Bloom filter does False positive matches are possible but false negatives are not in other words a query returns either possibly in set or definitely not in set A cuckoo filter can also delete existing items which is not supported by Bloom filters In addition for applications that store many items and target moderately low false positive rates cuckoo filters can achieve lower space overhead than space optimized Bloom filters 1 Cuckoo filters were first described in 2014 2 Contents 1 Algorithm description 2 Comparison to Bloom filters 3 Limitations 4 References 5 External linksAlgorithm description EditA cuckoo filter uses a hash table based on cuckoo hashing to store the fingerprints of items 2 The data structure is broken into buckets of some size b displaystyle b To insert the fingerprint of an item x displaystyle x one first computes two potential buckets h 1 x displaystyle h 1 x and h 2 x displaystyle h 2 x where x displaystyle x could go These buckets are calculated using the formula h 1 x hash x displaystyle h 1 x text hash x h 2 x h 1 x hash fingerprint x displaystyle h 2 x h 1 x oplus text hash text fingerprint x Note that due to the symmetry of the XOR operation one can compute h 2 x displaystyle h 2 x from h 1 x displaystyle h 1 x and h 1 x displaystyle h 1 x from h 2 x displaystyle h 2 x As defined above h 2 x h 1 x hash fingerprint x displaystyle h 2 x h 1 x oplus text hash text fingerprint x it follows that h 1 x h 2 x hash fingerprint x displaystyle h 1 x h 2 x oplus text hash text fingerprint x These properties are what make it possible to store the fingerprints with cuckoo hashing The fingerprint of x displaystyle x is placed into one of buckets h 1 x displaystyle h 1 x and h 2 x displaystyle h 2 x If the buckets are full then one of the fingerprints in the bucket is evicted using cuckoo hashing and placed into the other bucket where it can go If that bucket in turn is also full then that may trigger another eviction etc The hash table can achieve both high utilization thanks to cuckoo hashing and compactness because only fingerprints are stored Lookup and delete operations of a cuckoo filter are straightforward 2 There are a maximum of two buckets to check by h 1 x displaystyle h 1 x and h 2 x displaystyle h 2 x If found the appropriate lookup or delete operation can be performed in O b displaystyle O b time Often in practice b displaystyle b is a constant In order for the hash table to offer theoretical guarantees the fingerprint size f displaystyle f must be at least W log n b displaystyle Omega log n b bits 2 3 4 Subject to this constraint cuckoo filters guarantee a false positive rate of at most ϵ b 2 f 1 displaystyle epsilon leq b 2 f 1 2 Comparison to Bloom filters EditA cuckoo filter is similar to a Bloom filter in that they both are fast and compact and they may both return false positives as answers to set membership queries Space optimal Bloom filters use 1 44 log 2 1 ϵ displaystyle 1 44 log 2 1 epsilon bits of space per inserted key where ϵ displaystyle epsilon is the false positive rate A cuckoo filter requires log 2 1 ϵ 1 log 2 b a displaystyle log 2 1 epsilon 1 log 2 b alpha space per key 2 where a displaystyle alpha is the hash table load factor which can be 95 5 displaystyle 95 5 based on the cuckoo filter s setting Note that the information theoretical lower bound requires log 2 1 ϵ displaystyle log 2 1 epsilon bits for each item Both bloom filters and cuckoo filters with low load can be compressed when not in use On a positive lookup a space optimal Bloom filter requires a constant log 2 1 ϵ displaystyle log 2 1 epsilon memory accesses into the bit array whereas a cuckoo filter requires at most 2 b displaystyle 2b memory accesses which can be a constant in practice Cuckoo filters have degraded insertion speed after reaching a load threshold when table expanding is recommended In contrast Bloom filters can keep inserting new items at the cost of a higher false positive rate before expansion Bloom filters offer fast union and approximate intersection operations using cheap bitwise operations which can also be applied to compressed bloom filters if streaming compression is used Limitations EditA cuckoo filter can only delete items that are known to be inserted before Insertion can fail and rehashing is required like other cuckoo hash tables Note that the amortized insertion complexity is still O 1 displaystyle O 1 5 Cuckoo filters require a fingerprint size f displaystyle f of at least W log n b displaystyle Omega log n b bits This means that the space per key must be at least W log n b displaystyle Omega log n b bits even if ϵ displaystyle epsilon is large In practice b displaystyle b is chosen to be large enough that this is not a major issue 2 References Edit Computer programming portal Michael D Mitzenmacher Bloom Filters Cuckoo Hashing Cuckoo Filters Adaptive Cuckoo Filters and Learned Bloom Filters a b c d e f g Fan Bin Andersen Dave G Kaminsky Michael Mitzenmacher Michael D 2014 Cuckoo filter Practically better than Bloom Proc 10th ACM International on Conference on Emerging Networking Experiments and Technologies CoNEXT 14 Sydney Australia pp 75 88 doi 10 1145 2674005 2674994 ISBN 9781450332798 Eppstein David 22 June 2016 Cuckoo filter Simplification and analysis Proc 15th Scandinavian Symposium and Workshops on Algorithm Theory SWAT 2016 Leibniz International Proceedings in Informatics LIPIcs Vol 53 Reykjavik Iceland pp 8 1 8 12 arXiv 1604 06067 doi 10 4230 LIPIcs SWAT 2016 8 Fleming Noah 17 May 2018 Cuckoo Hashing and Cuckoo Filters PDF Technical report University of Toronto Pagh Rasmus Rodler Flemming Friche 2001 Cuckoo hashing Proc 9th Annual European Symposium on Algorithms ESA 2001 Lecture Notes in Computer Science Vol 2161 Arhus Denmark pp 121 133 doi 10 1007 3 540 44676 1 10 ISBN 978 3 540 42493 2 External links EditProbabilistic Filters By Example A tutorial comparing Cuckoo and Bloom filters Retrieved from https en wikipedia org w index php title Cuckoo filter amp oldid 1166981160, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.