fbpx
Wikipedia

Hash table

In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values.[2] A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored.

Hash table
TypeUnordered associative array
Invented1953
Time complexity in big O notation
Algorithm Average Worst case
Space Θ(n)[1] O(n)
Search Θ(1) O(n)
Insert Θ(1) O(n)
Delete Θ(1) O(n)
A small phone book as a hash table

Ideally, the hash function will assign each key to a unique bucket, but most hash table designs employ an imperfect hash function, which might cause hash collisions where the hash function generates the same index for more than one key. Such collisions are typically accommodated in some way.

In a well-dimensioned hash table, the average time complexity for each lookup is independent of the number of elements stored in the table. Many hash table designs also allow arbitrary insertions and deletions of key–value pairs, at amortized constant average cost per operation.[3][4][5]

Hashing is an example of a space-time tradeoff. If memory is infinite, the entire key can be used directly as an index to locate its value with a single memory access. On the other hand, if infinite time is available, values can be stored without regard for their keys, and a binary search or linear search can be used to retrieve the element.[6]: 458 

In many situations, hash tables turn out to be on average more efficient than search trees or any other table lookup structure. For this reason, they are widely used in many kinds of computer software, particularly for associative arrays, database indexing, caches, and sets.

History

The idea of hashing arose independently in different places. In January 1953, Hans Peter Luhn wrote an internal IBM memorandum that used hashing with chaining. Open addressing was later proposed by A. D. Linh building on Luhn's paper.[7]: 15  Around the same time, Gene Amdahl, Elaine M. McGraw, Nathaniel Rochester, and Arthur Samuel of IBM Research implemented hashing for the IBM 701 assembler.[8]: 124  Open addressing with linear probing is credited to Amdahl, although Ershov independently had the same idea.[8]: 124–125  The term "open addressing" was coined by W. Wesley Peterson on his article which discusses the problem of search in large files.[7]: 15 

The first published work on hashing with chaining is credited to Arnold Dumey, who discussed the idea of using remainder module a prime as a hash function.[7]: 15  The word "hashing" was first published by an article by Robert Morris.[8]: 126  A theoretical analysis of linear probing was submitted originally by Konheim and Weiss.[7]: 15 

Overview

An associative array stores a set of (key, value) pairs and allows insertion, deletion, and lookup (search), with the constraint of unique keys. In the hash table implementation of associative arrays, an array   of length   is partially filled with   elements, where  . A value   gets stored at an index location  , where   is a hash function, and  .[7]: 2  Under reasonable assumptions, hash tables have better time complexity bounds on search, delete, and insert operations in comparison to self-balancing binary search trees.[7]: 1 

Hash tables are also commonly used to implement sets, by omitting the stored value for each key and merely tracking whether the key is present.[7]: 1 

Load factor

A load factor   is a critical statistic of a hash table, and is defined as follows:[1]

 
where
  •   is the number of entries occupied in the hash table.
  •   is the number of buckets.

The performance of the hash table deteriorates in relation to the load factor  .[7]: 2  Therefore a hash table is resized or rehashed if the load factor   approaches 1.[9] A table is also resized if the load factor drops below  .[9] Acceptable figures of load factor   should range around 0.6 to 0.75.[10][11]: 110 

Hash function

A hash function   maps the universe   of keys   to array indices or slots within the table for each   where   and  . The conventional implementations of hash functions are based on the integer universe assumption that all elements of the table stem from the universe  , where the bit length of   is confined within the word size of a computer architecture.[7]: 2 

A perfect hash function   is defined as an injective function such that each element   in   maps to a unique value in  .[12][13] A perfect hash function can be created if all the keys are known ahead of time.[12]

Integer universe assumption

The schemes of hashing used in integer universe assumption include hashing by division, hashing by multiplication, universal hashing, dynamic perfect hashing, and static perfect hashing.[7]: 2  However, hashing by division is the commonly used scheme.[14]: 264 [11]: 110 

Hashing by division

The scheme in hashing by division is as follows:[7]: 2 

 
Where   is the hash digest of   and   is the size of the table.

Hashing by multiplication

The scheme in hashing by multiplication is as follows:[7]: 2–3 

 
Where   is a real-valued constant. An advantage of the hashing by multiplication is that the   is not critical.[7]: 2–3  Although any value   produces a hash function, Donald Knuth suggests using the golden ratio.[7]: 3 

Choosing a hash function

Uniform distribution of the hash values is a fundamental requirement of a hash function. A non-uniform distribution increases the number of collisions and the cost of resolving them. Uniformity is sometimes difficult to ensure by design, but may be evaluated empirically using statistical tests, e.g., a Pearson's chi-squared test for discrete uniform distributions.[15][16]

The distribution needs to be uniform only for table sizes that occur in the application. In particular, if one uses dynamic resizing with exact doubling and halving of the table size, then the hash function needs to be uniform only when the size is a power of two. Here the index can be computed as some range of bits of the hash function. On the other hand, some hashing algorithms prefer to have the size be a prime number.[17]

For open addressing schemes, the hash function should also avoid clustering, the mapping of two or more keys to consecutive slots. Such clustering may cause the lookup cost to skyrocket, even if the load factor is low and collisions are infrequent. The popular multiplicative hash is claimed to have particularly poor clustering behavior.[17][4]

K-independent hashing offers a way to prove a certain hash function does not have bad keysets for a given type of hashtable. A number of K-independence results are known for collision resolution schemes such as linear probing and cuckoo hashing. Since K-independence can prove a hash function works, one can then focus on finding the fastest possible such hash function.[18]

Collision resolution

A search algorithm that uses hashing consists of two parts. The first part is computing a hash function which transforms the search key into an array index. The ideal case is such that no two search keys hashes to the same array index. However, this is not always the case and is impossible to guarantee for unseen given data.[19]: 515  Hence the second part of the algorithm is collision resolution. The two common methods for collision resolution are separate chaining and open addressing.[6]: 458 

Separate chaining

 
Hash collision resolved by separate chaining
 
Hash collision by separate chaining with head records in the bucket array.

In separate chaining, the process involves building a linked list with key–value pair for each search array index. The collided items are chained together through a single linked list, which can be traversed to access the item with a unique search key.[6]: 464  Collision resolution through chaining with linked list is a common method of implementation of hash tables. Let   and   be the hash table and the node respectively, the operation involves as follows:[14]: 258 

Chained-Hash-Insert(T, k) insert x at the head of linked list T[h(k)] Chained-Hash-Search(T, k) search for an element with key k in linked list T[h(k)] Chained-Hash-Delete(T, k) delete x from the linked list T[h(k)] 

If the element is comparable either numerically or lexically, and inserted into the list by maintaining the total order, it results in faster termination of the unsuccessful searches.[19]: 520–521 

Other data structures for separate chaining

If the keys are ordered, it could be efficient to use "self-organizing" concepts such as using a self-balancing binary search tree, through which the theoretical worst case could be brought down to  , although it introduces additional complexities.[19]: 521 

In dynamic perfect hashing, two-level hash tables are used to reduce the look-up complexity to be a guaranteed   in the worst case. In this technique, the buckets of   entries are organized as perfect hash tables with   slots providing constant worst-case lookup time, and low amortized time for insertion.[20] A study shows array based separate chaining to be 97% more performant when compared to the standard linked list method under heavy load.[21]: 99 

Techniques such as using fusion tree for each buckets also result in constant time for all operations with high probability.[22]

Caching and locality of reference

The linked list of separate chaining implementation may not be cache-conscious due to spatial localitylocality of reference—when the nodes of the linked list are scattered across memory, thus the list traversal during insert and search may entail CPU cache inefficiencies.[21]: 91 

In cache-conscious variants, a dynamic array found to be more cache-friendly is used in the place where a linked list or self-balancing binary search trees is usually deployed for collision resolution through separate chaining, since the contiguous allocation pattern of the array could be exploited by hardware-cache prefetchers—such as translation lookaside buffer—resulting in reduced access time and memory consumption.[23][24][25]

Open addressing

 
Hash collision resolved by open addressing with linear probing (interval=1). Note that "Ted Baker" has a unique hash, but nevertheless collided with "Sandra Dee", that had previously collided with "John Smith".
 
This graph compares the average number of CPU cache misses required to look up elements in large hash tables (far exceeding size of the cache) with chaining and linear probing. Linear probing performs better due to better locality of reference, though as the table gets full, its performance degrades drastically.

Open addressing is another collision resolution technique in which every entry record is stored in the bucket array itself, and the hash resolution is performed through probing. When a new entry has to be inserted, the buckets are examined, starting with the hashed-to slot and proceeding in some probe sequence, until an unoccupied slot is found. When searching for an entry, the buckets are scanned in the same sequence, until either the target record is found, or an unused array slot is found, which indicates an unsuccessful search.[26]

Well-known probe sequences include:

  • Linear probing, in which the interval between probes is fixed (usually 1).[27]
  • Quadratic probing, in which the interval between probes is increased by adding the successive outputs of a quadratic polynomial to the value given by the original hash computation.[28]: 272 
  • Double hashing, in which the interval between probes is computed by a secondary hash function.[28]: 272–273 

The performance of open addressing may be slower compared to separate chaining since the probe sequence increases when the load factor   approaches 1.[9][21]: 93  The probing results in an infinite loop if the load factor reaches 1, in the case of a completely filled table.[6]: 471  The average cost of linear probing depends on the hash function's ability to distribute the elements uniformly throughout the table to avoid clustering, since formation of clusters would result in increased search time.[6]: 472 

Caching and locality of reference

Since the slots are located in successive locations, linear probing could lead to better utilization of CPU cache due to locality of references resulting in reduced memory latency.[27]

Other collision resolution techniques based on open addressing

Coalesced hashing

Coalesced hashing is a hybrid of both separate chaining and open addressing in which the buckets or nodes link within the table.[29]: 6–8  The algorithm is ideally suited for fixed memory allocation.[29]: 4  The collision in coalesced hashing is resolved by identifying the largest-indexed empty slot on the hash table, then the colliding value is inserted into that slot. The bucket is also linked to the inserted node's slot which contains its colliding hash address.[29]: 8 

Cuckoo hashing

Cuckoo hashing is a form of open addressing collision resolution technique which guarantees   worst-case lookup complexity and constant amortized time for insertions. The collision is resolved through maintaining two hash tables, each having its own hashing function, and collided slot gets replaced with the given item, and the preoccupied element of the slot gets displaced into the other hash table. The process continues until every key has its own spot in the empty buckets of the tables; if the procedure enters into infinite loop—which is identified through maintaining a threshold loop counter—both hash tables get rehashed with newer hash functions and the procedure continues.[30]: 124–125 

Hopscotch hashing

Hopscotch hashing is an open addressing based algorithm which combines the elements of cuckoo hashing, linear probing and chaining through the notion of a neighbourhood of buckets—the subsequent buckets around any given occupied bucket, also called a "virtual" bucket.[31]: 351–352  The algorithm is designed to deliver better performance when the load factor of the hash table grows beyond 90%; it also provides high throughput in concurrent settings, thus well suited for implementing resizable concurrent hash table.[31]: 350  The neighbourhood characteristic of hopscotch hashing guarantees a property that, the cost of finding the desired item from any given buckets within the neighbourhood is very close to the cost of finding it in the bucket itself; the algorithm attempts to be an item into its neighbourhood—with a possible cost involved in displacing other items.[31]: 352 

Each bucket within the hash table includes an additional "hop-information"—an H-bit bit array for indicating the relative distance of the item which was originally hashed into the current virtual bucket within H-1 entries.[31]: 352  Let   and   be the key to be inserted and bucket to which the key is hashed into respectively; several cases are involved in the insertion procedure such that the neighbourhood property of the algorithm is vowed:[31]: 352–353  if   is empty, the element is inserted, and the leftmost bit of bitmap is set to 1; if not empty, linear probing is used for finding an empty slot in the table, the bitmap of the bucket gets updated followed by the insertion; if the empty slot is not within the range of the neighbourhood, i.e. H-1, subsequent swap and hop-info bit array manipulation of each bucket is performed in accordance with its neighbourhood invariant properties.[31]: 353 

Robin Hood hashing

Robin hood hashing is an open addressing based collision resolution algorithm; the collisions are resolved through favouring the displacement of the element that is farthest—or longest probe sequence length (PSL)—from its "home location" i.e. the bucket to which the item was hashed into.[32]: 12  Although robin hood hashing does not change the theoretical search cost, it significantly affects the variance of the distribution of the items on the buckets,[33]: 2  i.e. dealing with cluster formation in the hash table.[34] Each node within the hash table that uses robin hood hashing should be augmented to store an extra PSL value.[35] Let   be the key to be inserted,   be the (incremental) PSL length of  ,   be the hash table and   be the index, the insertion procedure is as follows:[32]: 12–13 [36]: 5 

  • If  : the iteration goes into the next bucket without attempting an external probe.
  • If  : insert the item   into the bucket  ; swap   with  —let it be  ; continue the probe from the  st bucket to insert  ; repeat the procedure until every element is inserted.

Dynamic resizing

Repeated insertions cause the number of entries in a hash table to grow, which consequently increases the load factor; to maintain the amortized   performance of the lookup and insertion operations, a hash table is dynamically resized and the items of the tables are rehashed into the buckets of the new hash table,[9] since the items cannot be copied over as varying table sizes results in different hash value due to modulo operation.[37] If a hash table becomes "too empty" after deleting some elements, resizing may be performed to avoid excessive memory usage.[38]

Resizing by moving all entries

Generally, a new hash table with a size double that of the original hash table gets allocated privately and every item in the original hash table gets moved to the newly allocated one by computing the hash values of the items followed by the insertion operation. Rehashing is computationally expensive despite its simplicity.[39]: 478–479 

Alternatives to all-at-once rehashing

Some hash table implementations, notably in real-time systems, cannot pay the price of enlarging the hash table all at once, because it may interrupt time-critical operations. If one cannot avoid dynamic resizing, a solution is to perform the resizing gradually to avoid storage blip—typically at 50% of new table's size—during rehashing and to avoid memory fragmentation that triggers heap compaction due to deallocation of large memory blocks caused by the old hash table.[40]: 2–3  In such case, the rehashing operation is done incrementally through extending prior memory block allocated for the old hash table such that the buckets of the hash table remain unaltered. A common approach for amortized rehashing involves maintaining two hash functions   and  . The process of rehashing a bucket's items in accordance with the new hash function is termed as cleaning, which is implemented through command pattern by encapsulating the operations such as  ,   and   through a   wrapper such that each element in the bucket gets rehashed and its procedure involve as follows:[40]: 3 

  • Clean   bucket.
  • Clean   bucket.
  • The command gets executed.

Linear hashing

Linear hashing is an implementation of the hash table which enables dynamic growths or shrinks of the table one bucket at a time.[41]

Performance

The performance of a hash table is dependent on the hash function's ability in generating quasi-random numbers ( ) for entries in the hash table where  ,   and   denotes the key, number of buckets and the hash function such that  . If the hash function generates the same   for distinct keys ( ), this results in collision, which is dealt with in a variety of ways. The constant time complexity ( ) of the operation in a hash table is presupposed on the condition that the hash function doesn't generate colliding indices; thus, the performance of the hash table is directly proportional to the chosen hash function's ability to disperse the indices.[42]: 1  However, construction of such a hash function is practically infeasible, that being so, implementations depend on case-specific collision resolution techniques in achieving higher performance.[42]: 2 

Applications

Associative arrays

Hash tables are commonly used to implement many types of in-memory tables. They are used to implement associative arrays.[28]

Database indexing

Hash tables may also be used as disk-based data structures and database indices (such as in dbm) although B-trees are more popular in these applications.[43]

Caches

Hash tables can be used to implement caches, auxiliary data tables that are used to speed up the access to data that is primarily stored in slower media. In this application, hash collisions can be handled by discarding one of the two colliding entries—usually erasing the old item that is currently stored in the table and overwriting it with the new item, so every item in the table has a unique hash value.[44][45]

Sets

Hash tables can be used in the implementation of set data structure, which can store unique values without any particular order; set is typically used in testing the membership of a value in the collection, rather than element retrieval.[46]

Transposition table

A transposition table to a complex Hash Table which stores information about each section that has been searched.[47]

Implementations

Many programming languages provide hash table functionality, either as built-in associative arrays or as standard library modules.

In JavaScript, every value except for 7 "primitive" data types is called an "object", which uses either integers, strings, or guaranteed-unique "symbol" primitive values as keys for a hash map. ECMAScript 6 also added Map and Set data structures.[48]

C++11 includes unordered_map in its standard library for storing keys and values of arbitrary types.[49]

Go's built-in map implements a hash table in the form of a type.[50]

Java programming language includes the HashSet, HashMap, LinkedHashSet, and LinkedHashMap generic collections.[51]

Python's built-in dict implements a hash table in the form of a type.[52]

Ruby's built-in Hash uses the open addressing model from Ruby 2.4 onwards.[53]

Rust programming language includes HashMap, HashSet as part of the Rust Standard Library. [54]

See also

References

  1. ^ a b Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009). Introduction to Algorithms (3rd ed.). Massachusetts Institute of Technology. pp. 253–280. ISBN 978-0-262-03384-8.
  2. ^ Mehlhorn, Kurt; Sanders, Peter (2008), "4 Hash Tables and Associative Arrays", Algorithms and Data Structures: The Basic Toolbox (PDF), Springer, pp. 81–98
  3. ^ Leiserson, Charles E. (Fall 2005). "Lecture 13: Amortized Algorithms, Table Doubling, Potential Method". course MIT 6.046J/18.410J Introduction to Algorithms. from the original on August 7, 2009.
  4. ^ a b Knuth, Donald (1998). The Art of Computer Programming. Vol. 3: Sorting and Searching (2nd ed.). Addison-Wesley. pp. 513–558. ISBN 978-0-201-89685-5.
  5. ^ Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001). "Chapter 11: Hash Tables". Introduction to Algorithms (2nd ed.). MIT Press and McGraw-Hill. pp. 221–252. ISBN 978-0-262-53196-2.
  6. ^ a b c d e Sedgewick, Robert; Wayne, Kevin (2011). Algorithms. Vol. 1 (4 ed.). Addison-Wesley Professional – via Princeton University, Department of Computer Science.
  7. ^ a b c d e f g h i j k l m n Mehta, Dinesh P.; Sahni, Sartaj (October 28, 2004). "9: Hash Tables". Handbook of Datastructures and Applications (1 ed.). Taylor & Francis. doi:10.1201/9781420035179. ISBN 978-1-58488-435-4.
  8. ^ a b c Konheim, Alan G. (June 21, 2010). Hashing in Computer Science: Fifty Years of Slicing and Dicing. John Wiley & Sons, Inc. doi:10.1002/9780470630617. ISBN 9780470630617.
  9. ^ a b c d Mayers, Andrew (2008). "CS 312: Hash tables and amortized analysis". Cornell University, Department of Computer Science. from the original on April 26, 2021. Retrieved October 26, 2021 – via cs.cornell.edu.
  10. ^ Maurer, W.D.; Lewis, T.G. (March 1, 1975). "Hash Table Methods". ACM Computing Surveys. Journal of the ACM. 1 (1): 14. doi:10.1145/356643.356645. S2CID 17874775.
  11. ^ a b Owolabi, Olumide (February 1, 2003). "Empirical studies of some hashing functions". Information and Software Technology. Department of Mathematics and Computer Science, University of Port Harcourt. 45 (2): 109–112. doi:10.1016/S0950-5849(02)00174-X – via ScienceDirect.
  12. ^ a b Lu, Yi; Prabhakar, Balaji; Bonomi, Flavio (2006). Perfect Hashing for Network Applications. 2006 IEEE International Symposium on Information Theory. pp. 2774–2778. doi:10.1109/ISIT.2006.261567. ISBN 1-4244-0505-X. S2CID 1494710.
  13. ^ Belazzougui, Djamal; Botelho, Fabiano C.; Dietzfelbinger, Martin (2009). "Hash, displace, and compress" (PDF). Algorithms—ESA 2009: 17th Annual European Symposium, Copenhagen, Denmark, September 7-9, 2009, Proceedings. Lecture Notes in Computer Science. Vol. 5757. Berlin: Springer. pp. 682–693. CiteSeerX 10.1.1.568.130. doi:10.1007/978-3-642-04128-0_61. MR 2557794.
  14. ^ a b Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001). "Chapter 11: Hash Tables". Introduction to Algorithms (2nd ed.). Massachusetts Institute of Technology. ISBN 978-0-262-53196-2.
  15. ^ Pearson, Karl (1900). "On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling". Philosophical Magazine. Series 5. 50 (302): 157–175. doi:10.1080/14786440009463897.
  16. ^ Plackett, Robin (1983). "Karl Pearson and the Chi-Squared Test". International Statistical Review. 51 (1): 59–72. doi:10.2307/1402731. JSTOR 1402731.
  17. ^ a b Wang, Thomas (March 1997). . Archived from the original on September 3, 1999. Retrieved May 10, 2015.
  18. ^ Wegman, Mark N.; Carter, J. Lawrence (1981). "New hash functions and their use in authentication and set equality" (PDF). Journal of Computer and System Sciences. 22 (3): 265–279. doi:10.1016/0022-0000(81)90033-7. Conference version in FOCS'79. Retrieved February 9, 2011.
  19. ^ a b c Donald E. Knuth (April 24, 1998). The Art of Computer Programming: Volume 3: Sorting and Searching. Addison-Wesley Professional. ISBN 978-0-201-89685-5.
  20. ^ Demaine, Erik; Lind, Jeff (Spring 2003). "Lecture 2" (PDF). 6.897: Advanced Data Structures. MIT Computer Science and Artificial Intelligence Laboratory. (PDF) from the original on June 15, 2010. Retrieved June 30, 2008.
  21. ^ a b c Askitis, Nikolas; Zobel, Justin (2005). Cache-Conscious Collision Resolution in String Hash Tables. International Symposium on String Processing and Information Retrieval. Springer Science+Business Media. pp. 91–102. doi:10.1007/11575832_1. ISBN 978-3-540-29740-6.
  22. ^ Willard, Dan E. (2000). "Examining computational geometry, van Emde Boas trees, and hashing from the perspective of the fusion tree". SIAM Journal on Computing. 29 (3): 1030–1049. doi:10.1137/S0097539797322425. MR 1740562..
  23. ^ Askitis, Nikolas; Sinha, Ranjan (2010). "Engineering scalable, cache and space efficient tries for strings". The VLDB Journal. 17 (5): 634. doi:10.1007/s00778-010-0183-9. ISSN 1066-8888. S2CID 432572.
  24. ^ Askitis, Nikolas; Zobel, Justin (October 2005). "Cache-conscious Collision Resolution in String Hash Tables". Proceedings of the 12th International Conference, String Processing and Information Retrieval (SPIRE 2005). Vol. 3772/2005. pp. 91–102. doi:10.1007/11575832_11. ISBN 978-3-540-29740-6.
  25. ^ Askitis, Nikolas (2009). (PDF). Proceedings of the 32nd Australasian Computer Science Conference (ACSC 2009). Vol. 91. pp. 113–122. ISBN 978-1-920682-72-9. Archived from the original (PDF) on February 16, 2011. Retrieved June 13, 2010.
  26. ^ Tenenbaum, Aaron M.; Langsam, Yedidyah; Augenstein, Moshe J. (1990). Data Structures Using C. Prentice Hall. pp. 456–461, p. 472. ISBN 978-0-13-199746-2.
  27. ^ a b Pagh, Rasmus; Rodler, Flemming Friche (2001). "Cuckoo Hashing". Algorithms — ESA 2001. Lecture Notes in Computer Science. Vol. 2161. pp. 121–133. CiteSeerX 10.1.1.25.4189. doi:10.1007/3-540-44676-1_10. ISBN 978-3-540-42493-2.
  28. ^ a b c Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001), "11 Hash Tables", Introduction to Algorithms (2nd ed.), MIT Press and McGraw-Hill, pp. 221–252, ISBN 0-262-03293-7.
  29. ^ a b c Vitter, Jeffery S.; Chen, Wen-Chin (1987). The design and analysis of coalesced hashing. New York, United States: Oxford University Press. ISBN 978-0-19-504182-8 – via Archive.org.
  30. ^ Pagh, Rasmus; Rodler, Flemming Friche (2001). "Cuckoo Hashing". Algorithms — ESA 2001. Lecture Notes in Computer Science. Vol. 2161. CiteSeerX 10.1.1.25.4189. doi:10.1007/3-540-44676-1_10. ISBN 978-3-540-42493-2.
  31. ^ a b c d e f Herlihy, Maurice; Shavit, Nir; Tzafrir, Moran (2008). Hopscotch Hashing. International Symposium on Distributed Computing. Distributed Computing. Vol. 5218. Berlin, Heidelberg: Springer Publishing. pp. 350–364. doi:10.1007/978-3-540-87779-0_24. ISBN 978-3-540-87778-3 – via Springer Link.
  32. ^ a b Celis, Pedro (1986). Robin Hood Hashing (PDF). Ontario, Canada: University of Waterloo, Dept. of Computer Science. ISBN 031529700X. OCLC 14083698. (PDF) from the original on November 1, 2021. Retrieved November 2, 2021.
  33. ^ Poblete, P.V.; Viola, A. (August 14, 2018). "Analysis of Robin Hood and Other Hashing Algorithms Under the Random Probing Model, With and Without Deletions". Combinatorics, Probability and Computing. Cambridge University Press. 28 (4): 600–617. doi:10.1017/S0963548318000408. ISSN 1469-2163. S2CID 125374363. Retrieved November 1, 2021 – via Cambridge Core.
  34. ^ Clarkson, Michael (2014). "Lecture 13: Hash tables". Cornell University, Department of Computer Science. from the original on October 7, 2021. Retrieved November 1, 2021 – via cs.cornell.edu.
  35. ^ Gries, David (2017). "JavaHyperText and Data Structure: Robin Hood Hashing" (PDF). Cornell University, Department of Computer Science. (PDF) from the original on April 26, 2021. Retrieved November 2, 2021 – via cs.cornell.edu.
  36. ^ Celis, Pedro (March 28, 1988). External Robin Hood Hashing (PDF) (Technical report). Bloomington, Indiana: Indiana University, Department of Computer Science. 246. (PDF) from the original on November 2, 2021. Retrieved November 2, 2021.
  37. ^ Goddard, Wayne (2021). "Chater C5: Hash Tables" (PDF). Clemson University. pp. 15–16. (PDF) from the original on November 9, 2021. Retrieved November 9, 2021 – via people.cs.clemson.edu.
  38. ^ Devadas, Srini; Demaine, Erik (February 25, 2011). "Intro to Algorithms: Resizing Hash Tables" (PDF). Massachusetts Institute of Technology, Department of Computer Science. (PDF) from the original on May 7, 2021. Retrieved November 9, 2021 – via MIT OpenCourseWare.
  39. ^ Thareja, Reema (October 13, 2018). "Hashing and Collision". Data Structures Using C (2 ed.). Oxford University Press. ISBN 9780198099307.
  40. ^ a b Friedman, Scott; Krishnan, Anand; Leidefrost, Nicholas (March 18, 2003). "Hash Tables for Embedded and Real-time systems" (PDF). All Computer Science and Engineering Research. Washington University in St. Louis. doi:10.7936/K7WD3XXV. (PDF) from the original on June 9, 2021. Retrieved November 9, 2021 – via Northwestern University, Department of Computer Science.
  41. ^ Litwin, Witold (1980). "Linear hashing: A new tool for file and table addressing" (PDF). Proc. 6th Conference on Very Large Databases. Carnegie Mellon University. pp. 212–223. (PDF) from the original on May 6, 2021. Retrieved November 10, 2021 – via cs.cmu.edu.
  42. ^ a b Dijk, Tom Van (2010). "Analysing and Improving Hash Table Performance" (PDF). Netherlands: University of Twente. (PDF) from the original on November 6, 2021. Retrieved December 31, 2021.
  43. ^ Lech Banachowski. "Indexes and external sorting". pl:Polsko-Japońska Akademia Technik Komputerowych. Archived from the original on March 26, 2022. Retrieved March 26, 2022.
  44. ^ Zhong, Liang; Zheng, Xueqian; Liu, Yong; Wang, Mengting; Cao, Yang (February 2020). "Cache hit ratio maximization in device-to-device communications overlaying cellular networks". China Communications. 17 (2): 232–238. doi:10.23919/jcc.2020.02.018. ISSN 1673-5447. S2CID 212649328.
  45. ^ Bottommley, James (January 1, 2004). "Understanding Caching". Linux Journal. from the original on December 4, 2020. Retrieved April 16, 2022.
  46. ^ Jill Seaman (2014). (PDF). Texas State University. Archived from the original on April 1, 2022. Retrieved March 26, 2022.{{cite web}}: CS1 maint: bot: original URL status unknown (link)
  47. ^ "Transposition Table - Chessprogramming wiki". chessprogramming.org. from the original on February 14, 2021. Retrieved May 1, 2020.
  48. ^ "JavaScript data types and data structures - JavaScript | MDN". developer.mozilla.org. Retrieved July 24, 2022.
  49. ^ (PDF). International Organization for Standardization. pp. 812–813. Archived from the original (PDF) on January 21, 2022. Retrieved February 8, 2022.
  50. ^ "The Go Programming Language Specification". go.dev. Retrieved January 1, 2023.{{cite web}}: CS1 maint: url-status (link)
  51. ^ "Lesson: Implementations (The Java™ Tutorials > Collections)". docs.oracle.com. from the original on January 18, 2017. Retrieved April 27, 2018.
  52. ^ Zhang, Juan; Jia, Yunwei (2020). "Redis rehash optimization based on machine learning". Journal of Physics: Conference Series. 1453 (1): 3. Bibcode:2020JPhCS1453a2048Z. doi:10.1088/1742-6596/1453/1/012048. S2CID 215943738.
  53. ^ Jonan Scheffler (December 25, 2016). "Ruby 2.4 Released: Faster Hashes, Unified Integers and Better Rounding". heroku.com. from the original on July 3, 2019. Retrieved July 3, 2019.
  54. ^ "doc.rust-lang.org". from the original on December 8, 2022. Retrieved December 14, 2022. test

Further reading

  • Tamassia, Roberto; Goodrich, Michael T. (2006). "Chapter Nine: Maps and Dictionaries". Data structures and algorithms in Java : [updated for Java 5.0] (4th ed.). Hoboken, NJ: Wiley. pp. 369–418. ISBN 978-0-471-73884-8.
  • McKenzie, B. J.; Harries, R.; Bell, T. (February 1990). "Selecting a hashing algorithm". Software: Practice and Experience. 20 (2): 209–224. doi:10.1002/spe.4380200207. hdl:10092/9691. S2CID 12854386.

External links

  • NIST entry on hash tables
  • Open Data Structures – Chapter 5 – Hash Tables, Pat Morin
  • MIT's Introduction to Algorithms: Hashing 1 MIT OCW lecture Video
  • MIT's Introduction to Algorithms: Hashing 2 MIT OCW lecture Video

hash, table, confused, with, hash, list, hash, tree, rehash, redirects, here, south, park, episode, rehash, south, park, computing, hash, table, also, known, hash, data, structure, that, implements, associative, array, dictionary, abstract, data, type, that, m. Not to be confused with Hash list or Hash tree Rehash redirects here For the South Park episode see Rehash South Park In computing a hash table also known as hash map is a data structure that implements an associative array or dictionary It is an abstract data type that maps keys to values 2 A hash table uses a hash function to compute an index also called a hash code into an array of buckets or slots from which the desired value can be found During lookup the key is hashed and the resulting hash indicates where the corresponding value is stored Hash tableTypeUnordered associative arrayInvented1953Time complexity in big O notationAlgorithmAverageWorst caseSpace8 n 1 O n Search8 1 O n Insert8 1 O n Delete8 1 O n A small phone book as a hash table Ideally the hash function will assign each key to a unique bucket but most hash table designs employ an imperfect hash function which might cause hash collisions where the hash function generates the same index for more than one key Such collisions are typically accommodated in some way In a well dimensioned hash table the average time complexity for each lookup is independent of the number of elements stored in the table Many hash table designs also allow arbitrary insertions and deletions of key value pairs at amortized constant average cost per operation 3 4 5 Hashing is an example of a space time tradeoff If memory is infinite the entire key can be used directly as an index to locate its value with a single memory access On the other hand if infinite time is available values can be stored without regard for their keys and a binary search or linear search can be used to retrieve the element 6 458 In many situations hash tables turn out to be on average more efficient than search trees or any other table lookup structure For this reason they are widely used in many kinds of computer software particularly for associative arrays database indexing caches and sets Contents 1 History 2 Overview 2 1 Load factor 3 Hash function 3 1 Integer universe assumption 3 1 1 Hashing by division 3 1 2 Hashing by multiplication 3 2 Choosing a hash function 4 Collision resolution 4 1 Separate chaining 4 1 1 Other data structures for separate chaining 4 1 2 Caching and locality of reference 4 2 Open addressing 4 2 1 Caching and locality of reference 4 2 2 Other collision resolution techniques based on open addressing 4 2 2 1 Coalesced hashing 4 2 2 2 Cuckoo hashing 4 2 2 3 Hopscotch hashing 4 2 2 4 Robin Hood hashing 5 Dynamic resizing 5 1 Resizing by moving all entries 5 2 Alternatives to all at once rehashing 5 2 1 Linear hashing 6 Performance 7 Applications 7 1 Associative arrays 7 2 Database indexing 7 3 Caches 7 4 Sets 7 5 Transposition table 8 Implementations 9 See also 10 References 11 Further reading 12 External linksHistory EditThe idea of hashing arose independently in different places In January 1953 Hans Peter Luhn wrote an internal IBM memorandum that used hashing with chaining Open addressing was later proposed by A D Linh building on Luhn s paper 7 15 Around the same time Gene Amdahl Elaine M McGraw Nathaniel Rochester and Arthur Samuel of IBM Research implemented hashing for the IBM 701 assembler 8 124 Open addressing with linear probing is credited to Amdahl although Ershov independently had the same idea 8 124 125 The term open addressing was coined by W Wesley Peterson on his article which discusses the problem of search in large files 7 15 The first published work on hashing with chaining is credited to Arnold Dumey who discussed the idea of using remainder module a prime as a hash function 7 15 The word hashing was first published by an article by Robert Morris 8 126 A theoretical analysis of linear probing was submitted originally by Konheim and Weiss 7 15 Overview EditAn associative array stores a set of key value pairs and allows insertion deletion and lookup search with the constraint of unique keys In the hash table implementation of associative arrays an array A displaystyle A of length m displaystyle m is partially filled with n displaystyle n elements where m n displaystyle m geq n A value x displaystyle x gets stored at an index location A h x displaystyle A h x where h displaystyle h is a hash function and h x lt m displaystyle h x lt m 7 2 Under reasonable assumptions hash tables have better time complexity bounds on search delete and insert operations in comparison to self balancing binary search trees 7 1 Hash tables are also commonly used to implement sets by omitting the stored value for each key and merely tracking whether the key is present 7 1 Load factor Edit A load factor a displaystyle alpha is a critical statistic of a hash table and is defined as follows 1 load factor a n m displaystyle text load factor alpha frac n m where n displaystyle n is the number of entries occupied in the hash table m displaystyle m is the number of buckets The performance of the hash table deteriorates in relation to the load factor a displaystyle alpha 7 2 Therefore a hash table is resized or rehashed if the load factor a displaystyle alpha approaches 1 9 A table is also resized if the load factor drops below a max 4 displaystyle alpha max 4 9 Acceptable figures of load factor a displaystyle alpha should range around 0 6 to 0 75 10 11 110 Hash function EditA hash function h displaystyle h maps the universe U displaystyle U of keys h U 0 m 1 displaystyle h U rightarrow 0 m 1 to array indices or slots within the table for each h x 0 m 1 displaystyle h x in 0 m 1 where x S displaystyle x in S and m lt n displaystyle m lt n The conventional implementations of hash functions are based on the integer universe assumption that all elements of the table stem from the universe U 0 u 1 displaystyle U 0 u 1 where the bit length of u displaystyle u is confined within the word size of a computer architecture 7 2 A perfect hash function h displaystyle h is defined as an injective function such that each element x displaystyle x in S displaystyle S maps to a unique value in 0 m 1 displaystyle 0 m 1 12 13 A perfect hash function can be created if all the keys are known ahead of time 12 Integer universe assumption Edit The schemes of hashing used in integer universe assumption include hashing by division hashing by multiplication universal hashing dynamic perfect hashing and static perfect hashing 7 2 However hashing by division is the commonly used scheme 14 264 11 110 Hashing by division Edit The scheme in hashing by division is as follows 7 2 h x M mod n displaystyle h x M bmod n Where M displaystyle M is the hash digest of x S displaystyle x in S and n displaystyle n is the size of the table Hashing by multiplication Edit The scheme in hashing by multiplication is as follows 7 2 3 h k n M A mod 1 displaystyle h k lfloor n bigl MA bmod 1 bigr rfloor Where A displaystyle A is a real valued constant An advantage of the hashing by multiplication is that the m displaystyle m is not critical 7 2 3 Although any value A displaystyle A produces a hash function Donald Knuth suggests using the golden ratio 7 3 Choosing a hash function Edit Uniform distribution of the hash values is a fundamental requirement of a hash function A non uniform distribution increases the number of collisions and the cost of resolving them Uniformity is sometimes difficult to ensure by design but may be evaluated empirically using statistical tests e g a Pearson s chi squared test for discrete uniform distributions 15 16 The distribution needs to be uniform only for table sizes that occur in the application In particular if one uses dynamic resizing with exact doubling and halving of the table size then the hash function needs to be uniform only when the size is a power of two Here the index can be computed as some range of bits of the hash function On the other hand some hashing algorithms prefer to have the size be a prime number 17 For open addressing schemes the hash function should also avoid clustering the mapping of two or more keys to consecutive slots Such clustering may cause the lookup cost to skyrocket even if the load factor is low and collisions are infrequent The popular multiplicative hash is claimed to have particularly poor clustering behavior 17 4 K independent hashing offers a way to prove a certain hash function does not have bad keysets for a given type of hashtable A number of K independence results are known for collision resolution schemes such as linear probing and cuckoo hashing Since K independence can prove a hash function works one can then focus on finding the fastest possible such hash function 18 Collision resolution EditSee also 2 choice hashing A search algorithm that uses hashing consists of two parts The first part is computing a hash function which transforms the search key into an array index The ideal case is such that no two search keys hashes to the same array index However this is not always the case and is impossible to guarantee for unseen given data 19 515 Hence the second part of the algorithm is collision resolution The two common methods for collision resolution are separate chaining and open addressing 6 458 Separate chaining Edit Hash collision resolved by separate chaining Hash collision by separate chaining with head records in the bucket array In separate chaining the process involves building a linked list with key value pair for each search array index The collided items are chained together through a single linked list which can be traversed to access the item with a unique search key 6 464 Collision resolution through chaining with linked list is a common method of implementation of hash tables Let T displaystyle T and x displaystyle x be the hash table and the node respectively the operation involves as follows 14 258 Chained Hash Insert T k insert x at the head of linked list T h k Chained Hash Search T k search for an element with key k in linked list T h k Chained Hash Delete T k delete x from the linked list T h k If the element is comparable either numerically or lexically and inserted into the list by maintaining the total order it results in faster termination of the unsuccessful searches 19 520 521 Other data structures for separate chaining Edit If the keys are ordered it could be efficient to use self organizing concepts such as using a self balancing binary search tree through which the theoretical worst case could be brought down to O log n displaystyle O log n although it introduces additional complexities 19 521 In dynamic perfect hashing two level hash tables are used to reduce the look up complexity to be a guaranteed O 1 displaystyle O 1 in the worst case In this technique the buckets of k displaystyle k entries are organized as perfect hash tables with k 2 displaystyle k 2 slots providing constant worst case lookup time and low amortized time for insertion 20 A study shows array based separate chaining to be 97 more performant when compared to the standard linked list method under heavy load 21 99 Techniques such as using fusion tree for each buckets also result in constant time for all operations with high probability 22 Caching and locality of reference Edit The linked list of separate chaining implementation may not be cache conscious due to spatial locality locality of reference when the nodes of the linked list are scattered across memory thus the list traversal during insert and search may entail CPU cache inefficiencies 21 91 In cache conscious variants a dynamic array found to be more cache friendly is used in the place where a linked list or self balancing binary search trees is usually deployed for collision resolution through separate chaining since the contiguous allocation pattern of the array could be exploited by hardware cache prefetchers such as translation lookaside buffer resulting in reduced access time and memory consumption 23 24 25 Open addressing Edit Main article Open addressing Hash collision resolved by open addressing with linear probing interval 1 Note that Ted Baker has a unique hash but nevertheless collided with Sandra Dee that had previously collided with John Smith This graph compares the average number of CPU cache misses required to look up elements in large hash tables far exceeding size of the cache with chaining and linear probing Linear probing performs better due to better locality of reference though as the table gets full its performance degrades drastically Open addressing is another collision resolution technique in which every entry record is stored in the bucket array itself and the hash resolution is performed through probing When a new entry has to be inserted the buckets are examined starting with the hashed to slot and proceeding in some probe sequence until an unoccupied slot is found When searching for an entry the buckets are scanned in the same sequence until either the target record is found or an unused array slot is found which indicates an unsuccessful search 26 Well known probe sequences include Linear probing in which the interval between probes is fixed usually 1 27 Quadratic probing in which the interval between probes is increased by adding the successive outputs of a quadratic polynomial to the value given by the original hash computation 28 272 Double hashing in which the interval between probes is computed by a secondary hash function 28 272 273 The performance of open addressing may be slower compared to separate chaining since the probe sequence increases when the load factor a displaystyle alpha approaches 1 9 21 93 The probing results in an infinite loop if the load factor reaches 1 in the case of a completely filled table 6 471 The average cost of linear probing depends on the hash function s ability to distribute the elements uniformly throughout the table to avoid clustering since formation of clusters would result in increased search time 6 472 Caching and locality of reference Edit Since the slots are located in successive locations linear probing could lead to better utilization of CPU cache due to locality of references resulting in reduced memory latency 27 Other collision resolution techniques based on open addressing Edit Coalesced hashing Edit Main article Coalesced hashing Coalesced hashing is a hybrid of both separate chaining and open addressing in which the buckets or nodes link within the table 29 6 8 The algorithm is ideally suited for fixed memory allocation 29 4 The collision in coalesced hashing is resolved by identifying the largest indexed empty slot on the hash table then the colliding value is inserted into that slot The bucket is also linked to the inserted node s slot which contains its colliding hash address 29 8 Cuckoo hashing Edit Main article Cuckoo hashing Cuckoo hashing is a form of open addressing collision resolution technique which guarantees O 1 displaystyle O 1 worst case lookup complexity and constant amortized time for insertions The collision is resolved through maintaining two hash tables each having its own hashing function and collided slot gets replaced with the given item and the preoccupied element of the slot gets displaced into the other hash table The process continues until every key has its own spot in the empty buckets of the tables if the procedure enters into infinite loop which is identified through maintaining a threshold loop counter both hash tables get rehashed with newer hash functions and the procedure continues 30 124 125 Hopscotch hashing Edit Main article Hopscotch hashing Hopscotch hashing is an open addressing based algorithm which combines the elements of cuckoo hashing linear probing and chaining through the notion of a neighbourhood of buckets the subsequent buckets around any given occupied bucket also called a virtual bucket 31 351 352 The algorithm is designed to deliver better performance when the load factor of the hash table grows beyond 90 it also provides high throughput in concurrent settings thus well suited for implementing resizable concurrent hash table 31 350 The neighbourhood characteristic of hopscotch hashing guarantees a property that the cost of finding the desired item from any given buckets within the neighbourhood is very close to the cost of finding it in the bucket itself the algorithm attempts to be an item into its neighbourhood with a possible cost involved in displacing other items 31 352 Each bucket within the hash table includes an additional hop information an H bit bit array for indicating the relative distance of the item which was originally hashed into the current virtual bucket within H 1 entries 31 352 Let k displaystyle k and B k displaystyle Bk be the key to be inserted and bucket to which the key is hashed into respectively several cases are involved in the insertion procedure such that the neighbourhood property of the algorithm is vowed 31 352 353 if B k displaystyle Bk is empty the element is inserted and the leftmost bit of bitmap is set to 1 if not empty linear probing is used for finding an empty slot in the table the bitmap of the bucket gets updated followed by the insertion if the empty slot is not within the range of the neighbourhood i e H 1 subsequent swap and hop info bit array manipulation of each bucket is performed in accordance with its neighbourhood invariant properties 31 353 Robin Hood hashing Edit Robin hood hashing is an open addressing based collision resolution algorithm the collisions are resolved through favouring the displacement of the element that is farthest or longest probe sequence length PSL from its home location i e the bucket to which the item was hashed into 32 12 Although robin hood hashing does not change the theoretical search cost it significantly affects the variance of the distribution of the items on the buckets 33 2 i e dealing with cluster formation in the hash table 34 Each node within the hash table that uses robin hood hashing should be augmented to store an extra PSL value 35 Let x displaystyle x be the key to be inserted x p s l displaystyle x psl be the incremental PSL length of x displaystyle x T displaystyle T be the hash table and j displaystyle j be the index the insertion procedure is as follows 32 12 13 36 5 If x p s l T j p s l displaystyle x psl leq T j psl the iteration goes into the next bucket without attempting an external probe If x p s l gt T j p s l displaystyle x psl gt T j psl insert the item x displaystyle x into the bucket j displaystyle j swap x displaystyle x with T j displaystyle T j let it be x displaystyle x continue the probe from the j 1 displaystyle j 1 st bucket to insert x displaystyle x repeat the procedure until every element is inserted Dynamic resizing EditRepeated insertions cause the number of entries in a hash table to grow which consequently increases the load factor to maintain the amortized O 1 displaystyle O 1 performance of the lookup and insertion operations a hash table is dynamically resized and the items of the tables are rehashed into the buckets of the new hash table 9 since the items cannot be copied over as varying table sizes results in different hash value due to modulo operation 37 If a hash table becomes too empty after deleting some elements resizing may be performed to avoid excessive memory usage 38 Resizing by moving all entries Edit Generally a new hash table with a size double that of the original hash table gets allocated privately and every item in the original hash table gets moved to the newly allocated one by computing the hash values of the items followed by the insertion operation Rehashing is computationally expensive despite its simplicity 39 478 479 Alternatives to all at once rehashing Edit Some hash table implementations notably in real time systems cannot pay the price of enlarging the hash table all at once because it may interrupt time critical operations If one cannot avoid dynamic resizing a solution is to perform the resizing gradually to avoid storage blip typically at 50 of new table s size during rehashing and to avoid memory fragmentation that triggers heap compaction due to deallocation of large memory blocks caused by the old hash table 40 2 3 In such case the rehashing operation is done incrementally through extending prior memory block allocated for the old hash table such that the buckets of the hash table remain unaltered A common approach for amortized rehashing involves maintaining two hash functions h old displaystyle h text old and h new displaystyle h text new The process of rehashing a bucket s items in accordance with the new hash function is termed as cleaning which is implemented through command pattern by encapsulating the operations such as A d d k e y displaystyle mathrm Add mathrm key G e t k e y displaystyle mathrm Get mathrm key and D e l e t e k e y displaystyle mathrm Delete mathrm key through a L o o k u p k e y command displaystyle mathrm Lookup mathrm key text command wrapper such that each element in the bucket gets rehashed and its procedure involve as follows 40 3 Clean T a b l e h old k e y displaystyle mathrm Table h text old mathrm key bucket Clean T a b l e h new k e y displaystyle mathrm Table h text new mathrm key bucket The command gets executed Linear hashing Edit Main article Linear hashing Linear hashing is an implementation of the hash table which enables dynamic growths or shrinks of the table one bucket at a time 41 Performance EditThe performance of a hash table is dependent on the hash function s ability in generating quasi random numbers s displaystyle sigma for entries in the hash table where K displaystyle K n displaystyle n and h x displaystyle h x denotes the key number of buckets and the hash function such that s h K n displaystyle sigma h K n If the hash function generates the same s displaystyle sigma for distinct keys K 1 K 2 h K 1 h K 2 displaystyle K 1 neq K 2 h K 1 h K 2 this results in collision which is dealt with in a variety of ways The constant time complexity O 1 displaystyle O 1 of the operation in a hash table is presupposed on the condition that the hash function doesn t generate colliding indices thus the performance of the hash table is directly proportional to the chosen hash function s ability to disperse the indices 42 1 However construction of such a hash function is practically infeasible that being so implementations depend on case specific collision resolution techniques in achieving higher performance 42 2 Applications EditAssociative arrays Edit Main article Associative array Hash tables are commonly used to implement many types of in memory tables They are used to implement associative arrays 28 Database indexing Edit Hash tables may also be used as disk based data structures and database indices such as in dbm although B trees are more popular in these applications 43 Caches Edit Main article Cache computing Hash tables can be used to implement caches auxiliary data tables that are used to speed up the access to data that is primarily stored in slower media In this application hash collisions can be handled by discarding one of the two colliding entries usually erasing the old item that is currently stored in the table and overwriting it with the new item so every item in the table has a unique hash value 44 45 Sets Edit Main article Set data structure Hash tables can be used in the implementation of set data structure which can store unique values without any particular order set is typically used in testing the membership of a value in the collection rather than element retrieval 46 Transposition table Edit Main article Transposition table A transposition table to a complex Hash Table which stores information about each section that has been searched 47 Implementations EditMany programming languages provide hash table functionality either as built in associative arrays or as standard library modules In JavaScript every value except for 7 primitive data types is called an object which uses either integers strings or guaranteed unique symbol primitive values as keys for a hash map ECMAScript 6 also added Map and Set data structures 48 C 11 includes a href Unordered map C 2B 2B html class mw redirect title Unordered map C unordered map a in its standard library for storing keys and values of arbitrary types 49 Go s built in map implements a hash table in the form of a type 50 Java programming language includes the HashSet HashMap LinkedHashSet and LinkedHashMap generic collections 51 Python s built in dict implements a hash table in the form of a type 52 Ruby s built in Hash uses the open addressing model from Ruby 2 4 onwards 53 Rust programming language includes HashMap HashSet as part of the Rust Standard Library 54 See also EditRabin Karp string search algorithm Stable hashing Consistent hashing Extendible hashing Lazy deletion Pearson hashing PhotoDNA Search data structure Concurrent hash table Bloom filter Hash array mapped trie Distributed hash tableReferences Edit a b Cormen Thomas H Leiserson Charles E Rivest Ronald L Stein Clifford 2009 Introduction to Algorithms 3rd ed Massachusetts Institute of Technology pp 253 280 ISBN 978 0 262 03384 8 Mehlhorn Kurt Sanders Peter 2008 4 Hash Tables and Associative Arrays Algorithms and Data Structures The Basic Toolbox PDF Springer pp 81 98 Leiserson Charles E Fall 2005 Lecture 13 Amortized Algorithms Table Doubling Potential Method course MIT 6 046J 18 410J Introduction to Algorithms Archived from the original on August 7 2009 a b Knuth Donald 1998 The Art of Computer Programming Vol 3 Sorting and Searching 2nd ed Addison Wesley pp 513 558 ISBN 978 0 201 89685 5 Cormen Thomas H Leiserson Charles E Rivest Ronald L Stein Clifford 2001 Chapter 11 Hash Tables Introduction to Algorithms 2nd ed MIT Press and McGraw Hill pp 221 252 ISBN 978 0 262 53196 2 a b c d e Sedgewick Robert Wayne Kevin 2011 Algorithms Vol 1 4 ed Addison Wesley Professional via Princeton University Department of Computer Science a b c d e f g h i j k l m n Mehta Dinesh P Sahni Sartaj October 28 2004 9 Hash Tables Handbook of Datastructures and Applications 1 ed Taylor amp Francis doi 10 1201 9781420035179 ISBN 978 1 58488 435 4 a b c Konheim Alan G June 21 2010 Hashing in Computer Science Fifty Years of Slicing and Dicing John Wiley amp Sons Inc doi 10 1002 9780470630617 ISBN 9780470630617 a b c d Mayers Andrew 2008 CS 312 Hash tables and amortized analysis Cornell University Department of Computer Science Archived from the original on April 26 2021 Retrieved October 26 2021 via cs cornell edu Maurer W D Lewis T G March 1 1975 Hash Table Methods ACM Computing Surveys Journal of the ACM 1 1 14 doi 10 1145 356643 356645 S2CID 17874775 a b Owolabi Olumide February 1 2003 Empirical studies of some hashing functions Information and Software Technology Department of Mathematics and Computer Science University of Port Harcourt 45 2 109 112 doi 10 1016 S0950 5849 02 00174 X via ScienceDirect a b Lu Yi Prabhakar Balaji Bonomi Flavio 2006 Perfect Hashing for Network Applications 2006 IEEE International Symposium on Information Theory pp 2774 2778 doi 10 1109 ISIT 2006 261567 ISBN 1 4244 0505 X S2CID 1494710 Belazzougui Djamal Botelho Fabiano C Dietzfelbinger Martin 2009 Hash displace and compress PDF Algorithms ESA 2009 17th Annual European Symposium Copenhagen Denmark September 7 9 2009 Proceedings Lecture Notes in Computer Science Vol 5757 Berlin Springer pp 682 693 CiteSeerX 10 1 1 568 130 doi 10 1007 978 3 642 04128 0 61 MR 2557794 a b Cormen Thomas H Leiserson Charles E Rivest Ronald L Stein Clifford 2001 Chapter 11 Hash Tables Introduction to Algorithms 2nd ed Massachusetts Institute of Technology ISBN 978 0 262 53196 2 Pearson Karl 1900 On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling Philosophical Magazine Series 5 50 302 157 175 doi 10 1080 14786440009463897 Plackett Robin 1983 Karl Pearson and the Chi Squared Test International Statistical Review 51 1 59 72 doi 10 2307 1402731 JSTOR 1402731 a b Wang Thomas March 1997 Prime Double Hash Table Archived from the original on September 3 1999 Retrieved May 10 2015 Wegman Mark N Carter J Lawrence 1981 New hash functions and their use in authentication and set equality PDF Journal of Computer and System Sciences 22 3 265 279 doi 10 1016 0022 0000 81 90033 7 Conference version in FOCS 79 Retrieved February 9 2011 a b c Donald E Knuth April 24 1998 The Art of Computer Programming Volume 3 Sorting and Searching Addison Wesley Professional ISBN 978 0 201 89685 5 Demaine Erik Lind Jeff Spring 2003 Lecture 2 PDF 6 897 Advanced Data Structures MIT Computer Science and Artificial Intelligence Laboratory Archived PDF from the original on June 15 2010 Retrieved June 30 2008 a b c Askitis Nikolas Zobel Justin 2005 Cache Conscious Collision Resolution in String Hash Tables International Symposium on String Processing and Information Retrieval Springer Science Business Media pp 91 102 doi 10 1007 11575832 1 ISBN 978 3 540 29740 6 Willard Dan E 2000 Examining computational geometry van Emde Boas trees and hashing from the perspective of the fusion tree SIAM Journal on Computing 29 3 1030 1049 doi 10 1137 S0097539797322425 MR 1740562 Askitis Nikolas Sinha Ranjan 2010 Engineering scalable cache and space efficient tries for strings The VLDB Journal 17 5 634 doi 10 1007 s00778 010 0183 9 ISSN 1066 8888 S2CID 432572 Askitis Nikolas Zobel Justin October 2005 Cache conscious Collision Resolution in String Hash Tables Proceedings of the 12th International Conference String Processing and Information Retrieval SPIRE 2005 Vol 3772 2005 pp 91 102 doi 10 1007 11575832 11 ISBN 978 3 540 29740 6 Askitis Nikolas 2009 Fast and Compact Hash Tables for Integer Keys PDF Proceedings of the 32nd Australasian Computer Science Conference ACSC 2009 Vol 91 pp 113 122 ISBN 978 1 920682 72 9 Archived from the original PDF on February 16 2011 Retrieved June 13 2010 Tenenbaum Aaron M Langsam Yedidyah Augenstein Moshe J 1990 Data Structures Using C Prentice Hall pp 456 461 p 472 ISBN 978 0 13 199746 2 a b Pagh Rasmus Rodler Flemming Friche 2001 Cuckoo Hashing Algorithms ESA 2001 Lecture Notes in Computer Science Vol 2161 pp 121 133 CiteSeerX 10 1 1 25 4189 doi 10 1007 3 540 44676 1 10 ISBN 978 3 540 42493 2 a b c Cormen Thomas H Leiserson Charles E Rivest Ronald L Stein Clifford 2001 11 Hash Tables Introduction to Algorithms 2nd ed MIT Press and McGraw Hill pp 221 252 ISBN 0 262 03293 7 a b c Vitter Jeffery S Chen Wen Chin 1987 The design and analysis of coalesced hashing New York United States Oxford University Press ISBN 978 0 19 504182 8 via Archive org Pagh Rasmus Rodler Flemming Friche 2001 Cuckoo Hashing Algorithms ESA 2001 Lecture Notes in Computer Science Vol 2161 CiteSeerX 10 1 1 25 4189 doi 10 1007 3 540 44676 1 10 ISBN 978 3 540 42493 2 a b c d e f Herlihy Maurice Shavit Nir Tzafrir Moran 2008 Hopscotch Hashing International Symposium on Distributed Computing Distributed Computing Vol 5218 Berlin Heidelberg Springer Publishing pp 350 364 doi 10 1007 978 3 540 87779 0 24 ISBN 978 3 540 87778 3 via Springer Link a b Celis Pedro 1986 Robin Hood Hashing PDF Ontario Canada University of Waterloo Dept of Computer Science ISBN 031529700X OCLC 14083698 Archived PDF from the original on November 1 2021 Retrieved November 2 2021 Poblete P V Viola A August 14 2018 Analysis of Robin Hood and Other Hashing Algorithms Under the Random Probing Model With and Without Deletions Combinatorics Probability and Computing Cambridge University Press 28 4 600 617 doi 10 1017 S0963548318000408 ISSN 1469 2163 S2CID 125374363 Retrieved November 1 2021 via Cambridge Core Clarkson Michael 2014 Lecture 13 Hash tables Cornell University Department of Computer Science Archived from the original on October 7 2021 Retrieved November 1 2021 via cs cornell edu Gries David 2017 JavaHyperText and Data Structure Robin Hood Hashing PDF Cornell University Department of Computer Science Archived PDF from the original on April 26 2021 Retrieved November 2 2021 via cs cornell edu Celis Pedro March 28 1988 External Robin Hood Hashing PDF Technical report Bloomington Indiana Indiana University Department of Computer Science 246 Archived PDF from the original on November 2 2021 Retrieved November 2 2021 Goddard Wayne 2021 Chater C5 Hash Tables PDF Clemson University pp 15 16 Archived PDF from the original on November 9 2021 Retrieved November 9 2021 via people cs clemson edu Devadas Srini Demaine Erik February 25 2011 Intro to Algorithms Resizing Hash Tables PDF Massachusetts Institute of Technology Department of Computer Science Archived PDF from the original on May 7 2021 Retrieved November 9 2021 via MIT OpenCourseWare Thareja Reema October 13 2018 Hashing and Collision Data Structures Using C 2 ed Oxford University Press ISBN 9780198099307 a b Friedman Scott Krishnan Anand Leidefrost Nicholas March 18 2003 Hash Tables for Embedded and Real time systems PDF All Computer Science and Engineering Research Washington University in St Louis doi 10 7936 K7WD3XXV Archived PDF from the original on June 9 2021 Retrieved November 9 2021 via Northwestern University Department of Computer Science Litwin Witold 1980 Linear hashing A new tool for file and table addressing PDF Proc 6th Conference on Very Large Databases Carnegie Mellon University pp 212 223 Archived PDF from the original on May 6 2021 Retrieved November 10 2021 via cs cmu edu a b Dijk Tom Van 2010 Analysing and Improving Hash Table Performance PDF Netherlands University of Twente Archived PDF from the original on November 6 2021 Retrieved December 31 2021 Lech Banachowski Indexes and external sorting pl Polsko Japonska Akademia Technik Komputerowych Archived from the original on March 26 2022 Retrieved March 26 2022 Zhong Liang Zheng Xueqian Liu Yong Wang Mengting Cao Yang February 2020 Cache hit ratio maximization in device to device communications overlaying cellular networks China Communications 17 2 232 238 doi 10 23919 jcc 2020 02 018 ISSN 1673 5447 S2CID 212649328 Bottommley James January 1 2004 Understanding Caching Linux Journal Archived from the original on December 4 2020 Retrieved April 16 2022 Jill Seaman 2014 Set amp Hash Tables PDF Texas State University Archived from the original on April 1 2022 Retrieved March 26 2022 a href Template Cite web html title Template Cite web cite web a CS1 maint bot original URL status unknown link Transposition Table Chessprogramming wiki chessprogramming org Archived from the original on February 14 2021 Retrieved May 1 2020 JavaScript data types and data structures JavaScript MDN developer mozilla org Retrieved July 24 2022 Programming language C Technical Specification PDF International Organization for Standardization pp 812 813 Archived from the original PDF on January 21 2022 Retrieved February 8 2022 The Go Programming Language Specification go dev Retrieved January 1 2023 a href Template Cite web html title Template Cite web cite web a CS1 maint url status link Lesson Implementations The Java Tutorials gt Collections docs oracle com Archived from the original on January 18 2017 Retrieved April 27 2018 Zhang Juan Jia Yunwei 2020 Redis rehash optimization based on machine learning Journal of Physics Conference Series 1453 1 3 Bibcode 2020JPhCS1453a2048Z doi 10 1088 1742 6596 1453 1 012048 S2CID 215943738 Jonan Scheffler December 25 2016 Ruby 2 4 Released Faster Hashes Unified Integers and Better Rounding heroku com Archived from the original on July 3 2019 Retrieved July 3 2019 doc rust lang org Archived from the original on December 8 2022 Retrieved December 14 2022 testFurther reading EditTamassia Roberto Goodrich Michael T 2006 Chapter Nine Maps and Dictionaries Data structures and algorithms in Java updated for Java 5 0 4th ed Hoboken NJ Wiley pp 369 418 ISBN 978 0 471 73884 8 McKenzie B J Harries R Bell T February 1990 Selecting a hashing algorithm Software Practice and Experience 20 2 209 224 doi 10 1002 spe 4380200207 hdl 10092 9691 S2CID 12854386 External links Edit Wikimedia Commons has media related to Hash tables Wikibooks has a book on the topic of Data Structures Hash Tables NIST entry on hash tables Open Data Structures Chapter 5 Hash Tables Pat Morin MIT s Introduction to Algorithms Hashing 1 MIT OCW lecture Video MIT s Introduction to Algorithms Hashing 2 MIT OCW lecture Video Retrieved from https en wikipedia org w index php title Hash table amp oldid 1140283527, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.