Several real-world problems call for more parsing power than is offered by the widely used and well-established deterministic parsing techniques. These techniques also create an artificial divide between lexical and context-free analysis phases, at the cost of significant complexity at their interface. In this paper we present the fusion of generalized LR parsing and scannerless parsing. This combination supports syntax definitions in which all aspects (lexical and context-free) of the syntax of a language are defined explicitly in one formalism. Furthermore, there are no restrictions on the class of grammars, thus allowing a natural syntax tree structure. Ambiguities that arise through the use of unrestricted grammars are handled by explicit disambiguation constructs, instead of implicit defaults that are taken by traditional scanner and parser generators. Hence, a syntax definition becomes a full declarative description of a language. Disambiguation constructs can be interpreted as filters on parse forests. Depending on the kind of disambiguation, filters can be applied at parser generation time, at parse time, or after parsing. Scannerless generalized LR parsing is a viable technique that has been applied in various industrial and academic projects.

, , , ,
CWI
Software Engineering [SEN]
Software Analysis and Transformation

van den Brand, M., Scheerder, J., Vinju, J., & Visser, E. (2001). Disambiguation filters for scannerless generalized LR parsers. Software Engineering [SEN]. CWI.