fbpx
Wikipedia

Dynamic perfect hashing

In computer science, dynamic perfect hashing is a programming technique for resolving collisions in a hash table data structure.[1][2][3] While more memory-intensive than its hash table counterparts,[citation needed] this technique is useful for situations where fast queries, insertions, and deletions must be made on a large set of elements.

Details edit

Static case edit

FKS Scheme edit

The problem of optimal static hashing was first solved in general by Fredman, Komlós and Szemerédi.[4] In their 1984 paper,[1] they detail a two-tiered hash table scheme in which each bucket of the (first-level) hash table corresponds to a separate second-level hash table. Keys are hashed twice—the first hash value maps to a certain bucket in the first-level hash table; the second hash value gives the position of that entry in that bucket's second-level hash table. The second-level table is guaranteed to be collision-free (i.e. perfect hashing) upon construction. Consequently, the look-up cost is guaranteed to be O(1) in the worst-case.[2]

In the static case, we are given a set with a total of x entries, each one with a unique key, ahead of time. Fredman, Komlós and Szemerédi pick a first-level hash table with size   buckets.[2]

To construct, x entries are separated into s buckets by the top-level hashing function, where  . Then for each bucket with k entries, a second-level table is allocated with   slots, and its hash function is selected at random from a universal hash function set so that it is collision-free (i.e. a perfect hash function) and stored alongside the hash table. If the hash function randomly selected creates a table with collisions, a new hash function is randomly selected until a collision-free table can be guaranteed. Finally, with the collision-free hash, the k entries are hashed into the second-level table.

The quadratic size of the   space ensures that randomly creating a table with collisions is infrequent and independent of the size of k, providing linear amortized construction time. Although each second-level table requires quadratic space, if the keys inserted into the first-level hash table are uniformly distributed, the structure as a whole occupies expected   space, since bucket sizes are small with high probability.[1]

The first-level hash function is specifically chosen so that, for the specific set of x unique key values, the total space T used by all the second-level hash tables has expected   space, and more specifically  . Fredman, Komlós and Szemerédi showed that given a universal hashing family of hash functions, at least half of those functions have that property.[2]

Dynamic case edit

Dietzfelbinger et al. present a dynamic dictionary algorithm that, when a set of n items is incrementally added to the dictionary, membership queries always run in constant time and therefore   worst-case time, the total storage required is   (linear), and   expected amortized insertion and deletion time (amortized constant time).

In the dynamic case, when a key is inserted into the hash table, if its entry in its respective subtable is occupied, then a collision is said to occur and the subtable is rebuilt based on its new total entry count and randomly selected hash function. Because the load factor of the second-level table is kept low  , rebuilding is infrequent, and the amortized expected cost of insertions is  .[2] Similarly, the amortized expected cost of deletions is  .[2]

Additionally, the ultimate sizes of the top-level table or any of the subtables is unknowable in the dynamic case. One method for maintaining expected   space of the table is to prompt a full reconstruction when a sufficient number of insertions and deletions have occurred. By results due to Dietzfelbinger et al.,[2] as long as the total number of insertions or deletions exceeds the number of elements at the time of last construction, the amortized expected cost of insertion and deletion remain   with full rehashing taken into consideration.

The implementation of dynamic perfect hashing by Dietzfelbinger et al. uses these concepts, as well as lazy deletion, and is shown in pseudocode below.

Pseudocode implementation edit

Locate edit

function Locate(x) is j := h(x) if (position hj(x) of subtable Tj contains x (not deleted)) return (x is in S) end if else return (x is not in S) end else end 

Insert edit

During the insertion of a new entry x at j, the global operations counter, count, is incremented.

If x exists at j, but is marked as deleted, then the mark is removed.

If x exists at j or at the subtable Tj, and is not marked as deleted, then a collision is said to occur and the jth bucket's second-level table Tj is rebuilt with a different randomly selected hash function hj.

function Insert(x) is count = count + 1; if (count > M) FullRehash(x); end if else j = h(x); if (Position hj(x) of subtable Tj contains x)  if (x is marked deleted)  remove the delete marker;  end if end if else  bj = bj + 1;  if (bj <= mj)  if position hj(x) of Tj is empty  store x in position hj(x) of Tj;  end if  else  Put all unmarked elements of Tj in list Lj;  Append x to list Lj;  bj = length of Lj;  repeat   hj = randomly chosen function in Hsj;  until hj is injective on the elements of Lj;  for all y on list Lj   store y in position hj(y) of Tj;  end for  end else  end if  else  mj = 2 * max{1, mj};  sj = 2 * mj * (mj - 1);  if the sum total of all sj ≤ 32 * M2 / s(M) + 4 * M  Allocate sj cells for Tj;  Put all unmarked elements of Tj in list Lj;  Append x to list Lj;  bj = length of Lj;  repeat   hj = randomly chosen function in Hsj;  until hj is injective on the elements of Lj;  for all y on list Lj   store y in position hj(y) of Tj;  end for  end if  else  FullRehash(x);  end else  end else end else end else end 

Delete edit

Deletion of x simply flags x as deleted without removal and increments count. In the case of both insertions and deletions, if count reaches a threshold M the entire table is rebuilt, where M is some constant multiple of the size of S at the start of a new phase. Here phase refers to the time between full rebuilds. Note that here the -1 in "Delete(x)" is a representation of an element which is not in the set of all possible elements U.

function Delete(x) is count = count + 1; j = h(x); if position hj(x) of subtable Tj contains x mark x as deleted; end if else return (x is not a member of S); end else if (count >= M) FullRehash(-1); end if end 

Full rebuild edit

A full rebuild of the table of S first starts by removing all elements marked as deleted and then setting the next threshold value M to some constant multiple of the size of S. A hash function, which partitions S into s(M) subsets, where the size of subset j is sj, is repeatedly randomly chosen until:

 

Finally, for each subtable Tj a hash function hj is repeatedly randomly chosen from Hsj until hj is injective on the elements of Tj. The expected time for a full rebuild of the table of S with size n is O(n).[2]

function FullRehash(x) is Put all unmarked elements of T in list L; if (x is in U) append x to L; end if count = length of list L; M = (1 + c) * max{count, 4}; repeat h = randomly chosen function in Hs(M); for all j < s(M)  form a list Lj for h(x) = j;  bj = length of Lj;  mj = 2 * bj;  sj = 2 * mj * (mj - 1); end for until the sum total of all sj ≤ 32 * M2 / s(M) + 4 * M for all j < s(M) Allocate space sj for subtable Tj; repeat  hj = randomly chosen function in Hsj; until hj is injective on the elements of list Lj; end for for all x on list Lj store x in position hj(x) of Tj; end for end 

See also edit

References edit

  1. ^ a b c Fredman, M. L., Komlós, J., and Szemerédi, E. 1984. Storing a Sparse Table with 0(1) Worst Case Access Time. J. ACM 31, 3 (Jun. 1984), 538-544 http://portal.acm.org/citation.cfm?id=1884#
  2. ^ a b c d e f g h Dietzfelbinger, M., Karlin, A., Mehlhorn, K., Meyer auf der Heide, F., Rohnert, H., and Tarjan, R. E. 1994. "Dynamic Perfect Hashing: Upper and Lower Bounds" 2016-03-04 at the Wayback Machine. SIAM J. Comput. 23, 4 (Aug. 1994), 738-761. http://portal.acm.org/citation.cfm?id=182370 doi:10.1137/S0097539791194094
  3. ^ Erik Demaine, Jeff Lind. 6.897: Advanced Data Structures. MIT Computer Science and Artificial Intelligence Laboratory. Spring 2003.
  4. ^ Yap, Chee. "Universal Construction for the FKS Scheme". New York University. New York University. Retrieved 15 February 2015.[permanent dead link]

dynamic, perfect, hashing, computer, science, dynamic, perfect, hashing, programming, technique, resolving, collisions, hash, table, data, structure, while, more, memory, intensive, than, hash, table, counterparts, citation, needed, this, technique, useful, si. In computer science dynamic perfect hashing is a programming technique for resolving collisions in a hash table data structure 1 2 3 While more memory intensive than its hash table counterparts citation needed this technique is useful for situations where fast queries insertions and deletions must be made on a large set of elements Contents 1 Details 1 1 Static case 1 1 1 FKS Scheme 1 2 Dynamic case 2 Pseudocode implementation 2 1 Locate 2 2 Insert 2 3 Delete 2 4 Full rebuild 3 See also 4 ReferencesDetails editStatic case edit FKS Scheme edit Main article static hashing FKS Hashing The problem of optimal static hashing was first solved in general by Fredman Komlos and Szemeredi 4 In their 1984 paper 1 they detail a two tiered hash table scheme in which each bucket of the first level hash table corresponds to a separate second level hash table Keys are hashed twice the first hash value maps to a certain bucket in the first level hash table the second hash value gives the position of that entry in that bucket s second level hash table The second level table is guaranteed to be collision free i e perfect hashing upon construction Consequently the look up cost is guaranteed to be O 1 in the worst case 2 In the static case we are given a set with a total of x entries each one with a unique key ahead of time Fredman Komlos and Szemeredi pick a first level hash table with size s 2 x 1 displaystyle s 2 x 1 nbsp buckets 2 To construct x entries are separated into s buckets by the top level hashing function where s 2 x 1 displaystyle s 2 x 1 nbsp Then for each bucket with k entries a second level table is allocated with k 2 displaystyle k 2 nbsp slots and its hash function is selected at random from a universal hash function set so that it is collision free i e a perfect hash function and stored alongside the hash table If the hash function randomly selected creates a table with collisions a new hash function is randomly selected until a collision free table can be guaranteed Finally with the collision free hash the k entries are hashed into the second level table The quadratic size of the k 2 displaystyle k 2 nbsp space ensures that randomly creating a table with collisions is infrequent and independent of the size of k providing linear amortized construction time Although each second level table requires quadratic space if the keys inserted into the first level hash table are uniformly distributed the structure as a whole occupies expected O n displaystyle O n nbsp space since bucket sizes are small with high probability 1 The first level hash function is specifically chosen so that for the specific set of x unique key values the total space T used by all the second level hash tables has expected O n displaystyle O n nbsp space and more specifically T lt s 4 x displaystyle T lt s 4 cdot x nbsp Fredman Komlos and Szemeredi showed that given a universal hashing family of hash functions at least half of those functions have that property 2 Dynamic case edit Dietzfelbinger et al present a dynamic dictionary algorithm that when a set of n items is incrementally added to the dictionary membership queries always run in constant time and therefore O 1 displaystyle O 1 nbsp worst case time the total storage required is O n displaystyle O n nbsp linear and O 1 displaystyle O 1 nbsp expected amortized insertion and deletion time amortized constant time In the dynamic case when a key is inserted into the hash table if its entry in its respective subtable is occupied then a collision is said to occur and the subtable is rebuilt based on its new total entry count and randomly selected hash function Because the load factor of the second level table is kept low 1 k displaystyle 1 k nbsp rebuilding is infrequent and the amortized expected cost of insertions is O 1 displaystyle O 1 nbsp 2 Similarly the amortized expected cost of deletions is O 1 displaystyle O 1 nbsp 2 Additionally the ultimate sizes of the top level table or any of the subtables is unknowable in the dynamic case One method for maintaining expected O n displaystyle O n nbsp space of the table is to prompt a full reconstruction when a sufficient number of insertions and deletions have occurred By results due to Dietzfelbinger et al 2 as long as the total number of insertions or deletions exceeds the number of elements at the time of last construction the amortized expected cost of insertion and deletion remain O 1 displaystyle O 1 nbsp with full rehashing taken into consideration The implementation of dynamic perfect hashing by Dietzfelbinger et al uses these concepts as well as lazy deletion and is shown in pseudocode below Pseudocode implementation editLocate edit function Locate x is j h x if position hj x of subtable Tj contains x not deleted return x is in S end if else return x is not in S end else end Insert edit During the insertion of a new entry x at j the global operations counter count is incremented If x exists at j but is marked as deleted then the mark is removed If x exists at j or at the subtable Tj and is not marked as deleted then a collision is said to occur and the jth bucket s second level table Tj is rebuilt with a different randomly selected hash function hj function Insert x is count count 1 if count gt M FullRehash x end if else j h x if Position hj x of subtable Tj contains x if x is marked deleted remove the delete marker end if end if else bj bj 1 if bj lt mj if position hj x of Tj is empty store x in position hj x of Tj end if else Put all unmarked elements of Tj in list Lj Append x to list Lj bj length of Lj repeat hj randomly chosen function in Hsj until hj is injective on the elements of Lj for all y on list Lj store y in position hj y of Tj end for end else end if else mj 2 max 1 mj sj 2 mj mj 1 if the sum total of all sj 32 M2 s M 4 M Allocate sj cells for Tj Put all unmarked elements of Tj in list Lj Append x to list Lj bj length of Lj repeat hj randomly chosen function in Hsj until hj is injective on the elements of Lj for all y on list Lj store y in position hj y of Tj end for end if else FullRehash x end else end else end else end else end Delete edit Deletion of x simply flags x as deleted without removal and increments count In the case of both insertions and deletions if count reaches a threshold M the entire table is rebuilt where M is some constant multiple of the size of S at the start of a new phase Here phase refers to the time between full rebuilds Note that here the 1 in Delete x is a representation of an element which is not in the set of all possible elements U function Delete x is count count 1 j h x if position hj x of subtable Tj contains x mark x as deleted end if else return x is not a member of S end else if count gt M FullRehash 1 end if end Full rebuild edit A full rebuild of the table of S first starts by removing all elements marked as deleted and then setting the next threshold value M to some constant multiple of the size of S A hash function which partitions S into s M subsets where the size of subset j is sj is repeatedly randomly chosen until 0 j s M s j 32 M 2 s M 4 M displaystyle sum 0 leq j leq s M s j leq frac 32M 2 s M 4M nbsp Finally for each subtable Tj a hash function hj is repeatedly randomly chosen from Hsj until hj is injective on the elements of Tj The expected time for a full rebuild of the table of S with size n is O n 2 function FullRehash x is Put all unmarked elements of T in list L if x is in U append x to L end if count length of list L M 1 c max count 4 repeat h randomly chosen function in Hs M for all j lt s M form a list Lj for h x j bj length of Lj mj 2 bj sj 2 mj mj 1 end for until the sum total of all sj 32 M2 s M 4 M for all j lt s M Allocate space sj for subtable Tj repeat hj randomly chosen function in Hsj until hj is injective on the elements of list Lj end for for all x on list Lj store x in position hj x of Tj end for endSee also editPerfect hashingReferences edit a b c Fredman M L Komlos J and Szemeredi E 1984 Storing a Sparse Table with 0 1 Worst Case Access Time J ACM 31 3 Jun 1984 538 544 http portal acm org citation cfm id 1884 a b c d e f g h Dietzfelbinger M Karlin A Mehlhorn K Meyer auf der Heide F Rohnert H and Tarjan R E 1994 Dynamic Perfect Hashing Upper and Lower Bounds Archived 2016 03 04 at the Wayback Machine SIAM J Comput 23 4 Aug 1994 738 761 http portal acm org citation cfm id 182370 doi 10 1137 S0097539791194094 Erik Demaine Jeff Lind 6 897 Advanced Data Structures MIT Computer Science and Artificial Intelligence Laboratory Spring 2003 Yap Chee Universal Construction for the FKS Scheme New York University New York University Retrieved 15 February 2015 permanent dead link Retrieved from https en wikipedia org w index php title Dynamic perfect hashing amp oldid 1155056408, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.