ABSTRACT
Parsing expression grammars (PEGs) are a powerful and popular foundation for describing syntax. Despite PEGs' expressiveness, they cannot recognize many syntax patterns of popular programming languages. Typical examples include typedef-defined names in C/C++ and here documents appearing in many scripting languages. We use a single unified state representation, called a symbol table, to capture various context-sensitive patterns. Over the symbol table, we design a small set of restricted semantic predicates and actions. The extended PEGs are called SPEGs, and are designed to be safe in contexts of backtracking and the linear time guarantee of packrat parsing. This paper will show that SPEGs have improved the expressive power in such ways that they recognize practical context-sensitive grammars, including back referencing, indentation-based code layout, and contextual keywords.
- Michael D. Adams. 2013. Principled Parsing for Indentation-sensitive Languages: Revisiting Landin’s Offside Rule. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’13). ACM, New York, NY, USA, 511–522. Google ScholarDigital Library
- Michael D. Adams and Ömer S. Ağacan. 2014. Indentation-sensitive Parsing for Parsec. In Proceedings of the 2014 ACM SIGPLAN Symposium on Haskell (Haskell ’14). ACM, New York, NY, USA, 121–132. Google ScholarDigital Library
- Ali Afroozeh and Anastasia Izmaylova. 2015. One Parser to Rule Them All. In 2015 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Onward!) (Onward! 2015). ACM, New York, NY, USA, 151–170. Google ScholarDigital Library
- Martin Bravenboer, Éric Tanter, and Eelco Visser. 2006. Declarative, Formal, and Extensible Syntax Definition for aspectJ. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-oriented Programming Systems, Languages, and Applications (OOPSLA ’06). ACM, New York, NY, USA, 209–228. Google ScholarDigital Library
- Bryan Ford. 2002. Packrat parsing: a practical Linear-time algorithm with backtracking. Master’s thesis. Massachusetts Insitutute of Technology.Google Scholar
- Bryan Ford. 2002. Packrat Parsing:: Simple, Powerful, Lazy, Linear Time, Functional Pearl. In Proceedings of the Seventh ACM SIGPLAN International Conference on Functional Programming (ICFP ’02). ACM, New York, NY, USA, 36–47. Google ScholarDigital Library
- Bryan Ford. 2004. Parsing Expression Grammars: A Recognition-based Syntactic Foundation. In Proceedings of the 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’04). ACM, New York, NY, USA, 111–122. Google ScholarDigital Library
- Robert Grimm. 2006. Better Extensibility Through Modular Syntax. In Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’06). ACM, New York, NY, USA, 38–51. Google ScholarDigital Library
- Futoshi Iwama, Taiga Nakamura, and Hironori Takeuchi. 2012. Constructing Parser for Industrial Software Specifications Containing Formal and Natural Language Description. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12). IEEE Press, Piscataway, NJ, USA, 1012–1021. http://dl.acm.org/citation.cfm?id= 2337223.2337353Google ScholarDigital Library
- Trevor Jim, Yitzhak Mandelbaum, and David Walker. 2010. Semantics and Algorithms for Data-dependent Grammars. In Proceedings of the 37th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’10). ACM, New York, NY, USA, 417–430. Google ScholarDigital Library
- Stephen C Johnson. 1975. Yacc: Yet another compiler-compiler. Vol. 32. ell Laboratories Murray Hill, NJ.Google Scholar
- Manohar Jonnalagedda, Thierry Coppey, Sandro Stucki, Tiark Rompf, and Martin Odersky. 2014. Staged Parser Combinators for Efficient Data Processing. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA ’14). ACM, New York, NY, USA, 637–653. Google ScholarDigital Library
- Lennart C.L. Kats, Eelco Visser, and Guido Wachsmuth. 2010. Pure and Declarative Syntax Definition: Paradise Lost and Regained. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’10). ACM, New York, NY, USA, 918–932. Google ScholarDigital Library
- Kimio Kuramitsu. 2015. Packrat Parsing with Elastic Sliding Window. Journal of Infomration Processing 23, 4 (2015), 505–512. Google ScholarCross Ref
- Kimio Kuramitsu. 2016. Nez: Practical Open Grammar Language. In Proceedings of the 2016 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Onward! 2016). ACM, New York, NY, USA, 29–42. Google ScholarDigital Library
- Nicolas Laurent and Kim Mens. 2016. Taming Context-sensitive Languages with Principled Stateful Parsing. In Proceedings of the 2016 ACM SIGPLAN International Conference on Software Language Engineering (SLE 2016). ACM, New York, NY, USA, 15–27. Google ScholarDigital Library
- Terence Parr and Kathleen Fisher. 2011. LL(*): The Foundation of the ANTLR Parser Generator. In Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’11). ACM, New York, NY, USA, 425–436. Google ScholarDigital Library
- Sukyoung Ryu. 2009. Parsing Fortress Syntax. In Proceedings of the 7th International Conference on Principles and Practice of Programming in Java (PPPJ ’09). ACM, New York, NY, USA, 76–84. Google ScholarDigital Library
- Eric R. Van Wyk and August C. Schwerdfeger. 2007. Context-aware Scanning for Parsing Extensible Languages. In Proceedings of the 6th International Conference on Generative Programming and Component Engineering (GPCE ’07). ACM, New York, NY, USA, 63–72. Google ScholarDigital Library
- Alessandro Warth and Ian Piumarta. 2007. OMeta: An Object-oriented Language for Pattern Matching. In Proceedings of the 2007 Symposium on Dynamic Languages (DLS ’07). ACM, New York, NY, USA, 11–19. Google ScholarDigital Library
Index Terms
- A symbol-based extension of parsing expression grammars and context-sensitive packrat parsing
Recommendations
Parsing expression grammars: a recognition-based syntactic foundation
POPL '04: Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of programming languagesFor decades we have been using Chomsky's generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. The power of generative grammars to express ...
Parsing expression grammars made practical
SLE 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Software Language EngineeringParsing Expression Grammars (PEGs) define languages by specifying a recursive-descent parser that recognises them. The PEG formalism exhibits desirable properties, such as closure under composition, built-in disambiguation, unification of syntactic and ...
Left recursion in Parsing Expression Grammars
Parsing Expression Grammars (PEGs) are a formalism that can describe all deterministic context-free languages through a set of rules that specify a top-down parser for some language. PEGs are easy to use, and there are efficient implementations of PEG ...
Comments