fbpx
Wikipedia

GLR parser

A GLR parser (generalized left-to-right rightmost derivation parser) is an extension of an LR parser algorithm to handle non-deterministic and ambiguous grammars.[1] The theoretical foundation was provided in a 1974 paper[2] by Bernard Lang (along with other general context-free parsers such as GLL). It describes a systematic way to produce such algorithms, and provides uniform results regarding correctness proofs, complexity with respect to grammar classes, and optimization techniques. The first actual implementation of GLR was described in a 1984 paper by Masaru Tomita, it has also been referred to as a "parallel parser". Tomita presented five stages in his original work,[3] though in practice it is the second stage that is recognized as the GLR parser.

Though the algorithm has evolved since its original forms, the principles have remained intact. As shown by an earlier publication,[4] Lang was primarily interested in more easily used and more flexible parsers for extensible programming languages. Tomita's goal was to parse natural language text thoroughly and efficiently. Standard LR parsers cannot accommodate the nondeterministic and ambiguous nature of natural language, and the GLR algorithm can.

Algorithm edit

Briefly, the GLR algorithm works in a manner similar to the LR parser algorithm, except that, given a particular grammar, a GLR parser will process all possible interpretations of a given input in a breadth-first search. On the front-end, a GLR parser generator converts an input grammar into parser tables, in a manner similar to an LR generator. However, where LR parse tables allow for only one state transition (given a state and an input token), GLR parse tables allow for multiple transitions. In effect, GLR allows for shift/reduce and reduce/reduce conflicts.

When a conflicting transition is encountered, the parse stack is forked into two or more parallel parse stacks, where the state corresponding to each possible transition is at the top. Then, the next input token is read and used to determine the next transition(s) for each of the "top" states – and further forking can occur. If any given top state and input token do not result in at least one transition, then that "path" through the parse tables is invalid and can be discarded.

A crucial optimization known as a graph-structured stack allows sharing of common prefixes and suffixes of these stacks, which constrains the overall search space and memory usage required to parse input text. The complex structures that arise from this improvement make the search graph a directed acyclic graph (with additional restrictions on the "depths" of various nodes), rather than a tree.

Advantages edit

Recognition using the GLR algorithm has the same worst-case time complexity as the CYK algorithm and Earley algorithm: O(n3).[citation needed] However, GLR carries two additional advantages:

  • The time required to run the algorithm is proportional to the degree of nondeterminism in the grammar: on deterministic grammars the GLR algorithm runs in O(n) time (this is not true of the Earley[citation needed] and CYK algorithms, but the original Earley algorithms can be modified to ensure it)
  • The GLR algorithm is "online" – that is, it consumes the input tokens in a specific order and performs as much work as possible after consuming each token (also true for Earley).

In practice, the grammars of most programming languages are deterministic or "nearly deterministic", meaning that any nondeterminism is usually resolved within a small (though possibly unbounded) number of tokens[citation needed]. Compared to other algorithms capable of handling the full class of context-free grammars (such as Earley parser or CYK algorithm), the GLR algorithm gives better performance on these "nearly deterministic" grammars, because only a single stack will be active during the majority of the parsing process.

GLR can be combined with the LALR(1) algorithm, in a hybrid parser, allowing still higher performance.[5]

See also edit

References edit

  1. ^ Masaru Tomita (6 December 2012). Generalized LR Parsing. Springer Science & Business Media. ISBN 978-1-4615-4034-2.
  2. ^ Lang, Bernard (1974). "Deterministic techniques for efficient non-deterministic parsers". In Loeckx, J. (ed.). Automata, Languages and Programming. Lecture Notes in Computer Science. Vol. 14. Saarbrücken: Springer. pp. 255–269. doi:10.1007/3-540-06841-4_65. ISBN 978-3-540-06841-9. ISSN 0302-9743.
  3. ^ Masaru Tomita. Efficient parsing for natural language. Kluwer Academic Publishers, Boston, 1986.
  4. ^ Lang, Bernard (December 1971). "Parallel non-deterministic bottom-up parsing". ACM SIGPLAN Notices. Proceedings of the international symposium on Extensible languages. 6 (12): 56–57. doi:10.1145/942582.807982.
  5. ^ "Elkhound, Elsa and Cqual++: Open-Source Static Analysis for C++". YouTube. Archived from the original on 2021-12-21.

Further reading edit

  • Grune, Dick; Jacobs, Ceriel J.H. (2008). Parsing Techniques. Springer Science+Business Media. ISBN 978-0-387-20248-8.
  • Tomita, Masaru (1984). "LR parsers for natural languages". COLING. 10th International Conference on Computational Linguistics. pp. 354–357.
  • Tomita, Masaru (1985). "An efficient context-free parsing algorithm for natural languages". IJCAI. International Joint Conference on Artificial Intelligence. pp. 756–764.

parser, this, article, multiple, issues, please, help, improve, discuss, these, issues, talk, page, learn, when, remove, these, template, messages, this, article, includes, list, general, references, lacks, sufficient, corresponding, inline, citations, please,. This article has multiple issues Please help improve it or discuss these issues on the talk page Learn how and when to remove these template messages This article includes a list of general references but it lacks sufficient corresponding inline citations Please help to improve this article by introducing more precise citations May 2011 Learn how and when to remove this message The article s lead section may need to be rewritten Please help improve the lead and read the lead layout guide February 2015 Learn how and when to remove this message Learn how and when to remove this message A GLR parser generalized left to right rightmost derivation parser is an extension of an LR parser algorithm to handle non deterministic and ambiguous grammars 1 The theoretical foundation was provided in a 1974 paper 2 by Bernard Lang along with other general context free parsers such as GLL It describes a systematic way to produce such algorithms and provides uniform results regarding correctness proofs complexity with respect to grammar classes and optimization techniques The first actual implementation of GLR was described in a 1984 paper by Masaru Tomita it has also been referred to as a parallel parser Tomita presented five stages in his original work 3 though in practice it is the second stage that is recognized as the GLR parser Though the algorithm has evolved since its original forms the principles have remained intact As shown by an earlier publication 4 Lang was primarily interested in more easily used and more flexible parsers for extensible programming languages Tomita s goal was to parse natural language text thoroughly and efficiently Standard LR parsers cannot accommodate the nondeterministic and ambiguous nature of natural language and the GLR algorithm can Contents 1 Algorithm 2 Advantages 3 See also 4 References 5 Further readingAlgorithm editBriefly the GLR algorithm works in a manner similar to the LR parser algorithm except that given a particular grammar a GLR parser will process all possible interpretations of a given input in a breadth first search On the front end a GLR parser generator converts an input grammar into parser tables in a manner similar to an LR generator However where LR parse tables allow for only one state transition given a state and an input token GLR parse tables allow for multiple transitions In effect GLR allows for shift reduce and reduce reduce conflicts When a conflicting transition is encountered the parse stack is forked into two or more parallel parse stacks where the state corresponding to each possible transition is at the top Then the next input token is read and used to determine the next transition s for each of the top states and further forking can occur If any given top state and input token do not result in at least one transition then that path through the parse tables is invalid and can be discarded A crucial optimization known as a graph structured stack allows sharing of common prefixes and suffixes of these stacks which constrains the overall search space and memory usage required to parse input text The complex structures that arise from this improvement make the search graph a directed acyclic graph with additional restrictions on the depths of various nodes rather than a tree Advantages editRecognition using the GLR algorithm has the same worst case time complexity as the CYK algorithm and Earley algorithm O n3 citation needed However GLR carries two additional advantages The time required to run the algorithm is proportional to the degree of nondeterminism in the grammar on deterministic grammars the GLR algorithm runs in O n time this is not true of the Earley citation needed and CYK algorithms but the original Earley algorithms can be modified to ensure it The GLR algorithm is online that is it consumes the input tokens in a specific order and performs as much work as possible after consuming each token also true for Earley In practice the grammars of most programming languages are deterministic or nearly deterministic meaning that any nondeterminism is usually resolved within a small though possibly unbounded number of tokens citation needed Compared to other algorithms capable of handling the full class of context free grammars such as Earley parser or CYK algorithm the GLR algorithm gives better performance on these nearly deterministic grammars because only a single stack will be active during the majority of the parsing process GLR can be combined with the LALR 1 algorithm in a hybrid parser allowing still higher performance 5 See also editComparison of parser generators DMS Software Reengineering Toolkit GNU Bison a parser generator that can create LALR and GLR parsers Packrat parser another parse that can parse ambiguous and nondeterministic languagesReferences edit Masaru Tomita 6 December 2012 Generalized LR Parsing Springer Science amp Business Media ISBN 978 1 4615 4034 2 Lang Bernard 1974 Deterministic techniques for efficient non deterministic parsers In Loeckx J ed Automata Languages and Programming Lecture Notes in Computer Science Vol 14 Saarbrucken Springer pp 255 269 doi 10 1007 3 540 06841 4 65 ISBN 978 3 540 06841 9 ISSN 0302 9743 Masaru Tomita Efficient parsing for natural language Kluwer Academic Publishers Boston 1986 Lang Bernard December 1971 Parallel non deterministic bottom up parsing ACM SIGPLAN Notices Proceedings of the international symposium on Extensible languages 6 12 56 57 doi 10 1145 942582 807982 Elkhound Elsa and Cqual Open Source Static Analysis for C YouTube Archived from the original on 2021 12 21 Further reading editGrune Dick Jacobs Ceriel J H 2008 Parsing Techniques Springer Science Business Media ISBN 978 0 387 20248 8 Tomita Masaru 1984 LR parsers for natural languages COLING 10th International Conference on Computational Linguistics pp 354 357 Tomita Masaru 1985 An efficient context free parsing algorithm for natural languages IJCAI International Joint Conference on Artificial Intelligence pp 756 764 Retrieved from https en wikipedia org w index php title GLR parser amp oldid 1223656901, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.