fbpx
Wikipedia

Garsia–Wachs algorithm

The Garsia–Wachs algorithm is an efficient method for computers to construct optimal binary search trees and alphabetic Huffman codes, in linearithmic time. It is named after Adriano Garsia and Michelle L. Wachs.

Problem description edit

The input to the problem, for an integer  , consists of a sequence of   non-negative weights  . The output is a rooted binary tree with   internal nodes, each having exactly two children. Such a tree has exactly   leaf nodes, which can be identified (in the order given by the binary tree) with the   input weights. The goal of the problem is to find a tree, among all of the possible trees with   internal nodes, that minimizes the weighted sum of the external path lengths. These path lengths are the numbers of steps from the root to each leaf. They are multiplied by the weight of the leaf and then summed to give the quality of the overall tree.[1]

This problem can be interpreted as a problem of constructing a binary search tree for   ordered keys, with the assumption that the tree will be used only to search for values that are not already in the tree. In this case, the   keys partition the space of search values into   intervals, and the weight of one of these intervals can be taken as the probability of searching for a value that lands in that interval. The weighted sum of external path lengths controls the expected time for searching the tree.[1]

Alternatively, the output of the problem can be used as a Huffman code, a method for encoding   given values unambiguously by using variable-length sequences of binary values. In this interpretation, the code for a value is given by the sequence of left and right steps from a parent to the child on the path from the root to a leaf in the tree (e.g. with 0 for left and 1 for right). Unlike standard Huffman codes, the ones constructed in this way are alphabetical, meaning that the sorted order of these binary codes is the same as the input ordering of the values. If the weight of a value is its frequency in a message to be encoded, then the output of the Garsia–Wachs algorithm is the alphabetical Huffman code that compresses the message to the shortest possible length.[1]

Algorithm edit

 
The binary tree constructed in the first phase of the algorithm by finding and merging out-of-order triples in the input sequence (left) and the output of the algorithm, a correctly-ordered binary tree whose leaves are at the same levels as the other tree

Overall, the algorithm consists of three phases:[1]

  1. Build a binary tree having the values as leaves but possibly in the wrong order.
  2. Compute each leaf's distance from the root in the resulting tree.
  3. Build another binary tree with the leaves at the same distances but in the correct order.

The first phase of the algorithm is easier to describe if the input is augmented with two sentinel values,   (or any sufficiently large finite value) at the start and end of the sequence.[2]

The first phase maintains a forest of trees, initially a single-node tree for each non-sentinel input weight, which will eventually become the binary tree that it constructs. Each tree is associated with a value, the sum of the weights of its leaves makes a tree node for each non-sentinel input weight. The algorithm maintains a sequence of these values, with the two sentinel values at each end. The initial sequence is just the order in which the leaf weights were given as input. It then repeatedly performs the following steps, each of which reduces the length of the input sequence, until there is only one tree containing all the leaves:[1]

  • Find the first three consecutive weights  ,  , and   in the sequence for which  . There always exists such a triple, because the final sentinel value is larger than any previous two finite values.
  • Remove   and   from the sequence, and make a new tree node to be the parent of the nodes for   and  . Its value is  .
  • Reinsert the new node immediately after the rightmost earlier position whose value is greater than or equal to  . There always exists such a position, because of the left sentinel value.

To implement this phase efficiently, the algorithm can maintain its current sequence of values in any self-balancing binary search tree structure. Such a structure allows the removal of   and  , and the reinsertion of their new parent, in logarithmic time. In each step, the weights up to   in the even positions of the array form a decreasing sequence, and the weights in the odd positions form another decreasing sequence. Therefore, the position to reinsert   may be found in logarithmic time by using the balanced tree to perform two binary searches, one for each of these two decreasing sequences. The search for the first position for which   can be performed in linear total time by using a sequential search that begins at the   from the previous triple.[1]

It is nontrivial to prove that, in the third phase of the algorithm, another tree with the same distances exists and that this tree provides the optimal solution to the problem. But assuming this to be true, the second and third phases of the algorithm are straightforward to implement in linear time. Therefore, the total time for the algorithm, on an input of length  , is  .

History edit

The Garsia–Wachs algorithm is named after Adriano Garsia and Michelle L. Wachs, who published it in 1977.[1][3] Their algorithm simplified an earlier method of T. C. Hu and Alan Tucker,[1][4] and (although it is different in internal details) it ends up making the same comparisons in the same order as the Hu–Tucker algorithm.[5] The original proof of correctness of the Garsia–Wachs algorithm was complicated, and was later simplified by Kingston (1988)[1][2] and Karpinski, Larmore & Rytter (1997).[6]

References edit

  1. ^ a b c d e f g h i Knuth, Donald E. (1998), "Algorithm G (Garsia–Wachs algorithm for optimum binary trees)", The Art of Computer Programming, Vol. 3: Sorting and Searching (2nd ed.), Addison–Wesley, pp. 451–453. See also History and bibliography, pp. 453–454.
  2. ^ a b Kingston, Jeffrey H. (1988), "A new proof of the Garsia–Wachs algorithm", Journal of Algorithms, 9 (1): 129–136, doi:10.1016/0196-6774(88)90009-0, MR 0925602
  3. ^ Garsia, Adriano M.; Wachs, Michelle L. (1977), "A new algorithm for minimum cost binary trees", SIAM Journal on Computing, 6 (4): 622–642, doi:10.1137/0206045, MR 0520738
  4. ^ Hu, T. C.; Tucker, A. C. (1971), "Optimal computer search trees and variable-length alphabetical codes", SIAM Journal on Applied Mathematics, 21 (4): 514–532, doi:10.1137/0121057, MR 0304063
  5. ^ Mehlhorn, Kurt; Tsagarakis, Marcos (1979), "On the isomorphism of two algorithms: Hu/Tucker and Garsia/Wachs", Les arbres en algèbre et en programmation (4ème Colloq., Lille, 1979), Univ. Lille I, Lille, pp. 159–172, MR 0554347
  6. ^ Karpinski, Marek; Larmore, Lawrence L.; Rytter, Wojciech (1997), "Correctness of constructing optimal alphabetic trees revisited", Theoretical Computer Science, 180 (1–2): 309–324, doi:10.1016/S0304-3975(96)00296-4, MR 1453872

Further reading edit

  • Filliatre, Jean-Christophe (2008), "A functional implementation of the Garsia–Wachs algorithm (functional pearl)", Proceedings of the 2008 ACM SIGPLAN Workshop on ML (ML '08), New York, NY, USA: Association for Computing Machinery, pp. 91–96, doi:10.1145/1411304.1411317, ISBN 978-1-60558-062-3, S2CID 1541092

garsia, wachs, algorithm, efficient, method, computers, construct, optimal, binary, search, trees, alphabetic, huffman, codes, linearithmic, time, named, after, adriano, garsia, michelle, wachs, contents, problem, description, algorithm, history, references, f. The Garsia Wachs algorithm is an efficient method for computers to construct optimal binary search trees and alphabetic Huffman codes in linearithmic time It is named after Adriano Garsia and Michelle L Wachs Contents 1 Problem description 2 Algorithm 3 History 4 References 5 Further readingProblem description editThe input to the problem for an integer n displaystyle n nbsp consists of a sequence of n 1 displaystyle n 1 nbsp non negative weights w 0 w 1 w n displaystyle w 0 w 1 dots w n nbsp The output is a rooted binary tree with n displaystyle n nbsp internal nodes each having exactly two children Such a tree has exactly n 1 displaystyle n 1 nbsp leaf nodes which can be identified in the order given by the binary tree with the n 1 displaystyle n 1 nbsp input weights The goal of the problem is to find a tree among all of the possible trees with n displaystyle n nbsp internal nodes that minimizes the weighted sum of the external path lengths These path lengths are the numbers of steps from the root to each leaf They are multiplied by the weight of the leaf and then summed to give the quality of the overall tree 1 This problem can be interpreted as a problem of constructing a binary search tree for n displaystyle n nbsp ordered keys with the assumption that the tree will be used only to search for values that are not already in the tree In this case the n displaystyle n nbsp keys partition the space of search values into n 1 displaystyle n 1 nbsp intervals and the weight of one of these intervals can be taken as the probability of searching for a value that lands in that interval The weighted sum of external path lengths controls the expected time for searching the tree 1 Alternatively the output of the problem can be used as a Huffman code a method for encoding n 1 displaystyle n 1 nbsp given values unambiguously by using variable length sequences of binary values In this interpretation the code for a value is given by the sequence of left and right steps from a parent to the child on the path from the root to a leaf in the tree e g with 0 for left and 1 for right Unlike standard Huffman codes the ones constructed in this way are alphabetical meaning that the sorted order of these binary codes is the same as the input ordering of the values If the weight of a value is its frequency in a message to be encoded then the output of the Garsia Wachs algorithm is the alphabetical Huffman code that compresses the message to the shortest possible length 1 Algorithm edit nbsp The binary tree constructed in the first phase of the algorithm by finding and merging out of order triples in the input sequence left and the output of the algorithm a correctly ordered binary tree whose leaves are at the same levels as the other tree Overall the algorithm consists of three phases 1 Build a binary tree having the values as leaves but possibly in the wrong order Compute each leaf s distance from the root in the resulting tree Build another binary tree with the leaves at the same distances but in the correct order The first phase of the algorithm is easier to describe if the input is augmented with two sentinel values displaystyle infty nbsp or any sufficiently large finite value at the start and end of the sequence 2 The first phase maintains a forest of trees initially a single node tree for each non sentinel input weight which will eventually become the binary tree that it constructs Each tree is associated with a value the sum of the weights of its leaves makes a tree node for each non sentinel input weight The algorithm maintains a sequence of these values with the two sentinel values at each end The initial sequence is just the order in which the leaf weights were given as input It then repeatedly performs the following steps each of which reduces the length of the input sequence until there is only one tree containing all the leaves 1 Find the first three consecutive weights x displaystyle x nbsp y displaystyle y nbsp and z displaystyle z nbsp in the sequence for which x z displaystyle x leq z nbsp There always exists such a triple because the final sentinel value is larger than any previous two finite values Remove x displaystyle x nbsp and y displaystyle y nbsp from the sequence and make a new tree node to be the parent of the nodes for x displaystyle x nbsp and y displaystyle y nbsp Its value is x y displaystyle x y nbsp Reinsert the new node immediately after the rightmost earlier position whose value is greater than or equal to x y displaystyle x y nbsp There always exists such a position because of the left sentinel value To implement this phase efficiently the algorithm can maintain its current sequence of values in any self balancing binary search tree structure Such a structure allows the removal of x displaystyle x nbsp and y displaystyle y nbsp and the reinsertion of their new parent in logarithmic time In each step the weights up to y displaystyle y nbsp in the even positions of the array form a decreasing sequence and the weights in the odd positions form another decreasing sequence Therefore the position to reinsert x y displaystyle x y nbsp may be found in logarithmic time by using the balanced tree to perform two binary searches one for each of these two decreasing sequences The search for the first position for which x z displaystyle x leq z nbsp can be performed in linear total time by using a sequential search that begins at the z displaystyle z nbsp from the previous triple 1 It is nontrivial to prove that in the third phase of the algorithm another tree with the same distances exists and that this tree provides the optimal solution to the problem But assuming this to be true the second and third phases of the algorithm are straightforward to implement in linear time Therefore the total time for the algorithm on an input of length n displaystyle n nbsp is O n log n displaystyle O n log n nbsp History editThe Garsia Wachs algorithm is named after Adriano Garsia and Michelle L Wachs who published it in 1977 1 3 Their algorithm simplified an earlier method of T C Hu and Alan Tucker 1 4 and although it is different in internal details it ends up making the same comparisons in the same order as the Hu Tucker algorithm 5 The original proof of correctness of the Garsia Wachs algorithm was complicated and was later simplified by Kingston 1988 1 2 and Karpinski Larmore amp Rytter 1997 6 References edit a b c d e f g h i Knuth Donald E 1998 Algorithm G Garsia Wachs algorithm for optimum binary trees The Art of Computer Programming Vol 3 Sorting and Searching 2nd ed Addison Wesley pp 451 453 See also History and bibliography pp 453 454 a b Kingston Jeffrey H 1988 A new proof of the Garsia Wachs algorithm Journal of Algorithms 9 1 129 136 doi 10 1016 0196 6774 88 90009 0 MR 0925602 Garsia Adriano M Wachs Michelle L 1977 A new algorithm for minimum cost binary trees SIAM Journal on Computing 6 4 622 642 doi 10 1137 0206045 MR 0520738 Hu T C Tucker A C 1971 Optimal computer search trees and variable length alphabetical codes SIAM Journal on Applied Mathematics 21 4 514 532 doi 10 1137 0121057 MR 0304063 Mehlhorn Kurt Tsagarakis Marcos 1979 On the isomorphism of two algorithms Hu Tucker and Garsia Wachs Les arbres en algebre et en programmation 4eme Colloq Lille 1979 Univ Lille I Lille pp 159 172 MR 0554347 Karpinski Marek Larmore Lawrence L Rytter Wojciech 1997 Correctness of constructing optimal alphabetic trees revisited Theoretical Computer Science 180 1 2 309 324 doi 10 1016 S0304 3975 96 00296 4 MR 1453872Further reading editFilliatre Jean Christophe 2008 A functional implementation of the Garsia Wachs algorithm functional pearl Proceedings of the 2008 ACM SIGPLAN Workshop on ML ML 08 New York NY USA Association for Computing Machinery pp 91 96 doi 10 1145 1411304 1411317 ISBN 978 1 60558 062 3 S2CID 1541092 Retrieved from https en wikipedia org w index php title Garsia Wachs algorithm amp oldid 1187704784, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.