Automation of grammar recovery is an important research area that received attention over the last decade and a half. Given the abundance of available documentation for software languages that is only going to keep increasing in the future, there is need for reliable extraction techniques that allow grammar engineers to derive useful information from it. This information can be further used to build grammarware, like parsers or test generators, or to perform grammar investigation. Grammars obtained systematically from existing sources always have preference over manually constructed ones due to traceability of their issues, including errors and design weaknesses. This paper focuses on automated grammar recovery from sources that utilise a family of metasyntaxes known as EBNF: many language specifications extend the well-studied Backus Naur Form in different directions, resulting in unnecessary diversity of syntactic notations. To enable manipulation of EBNF families, we use EDD, the EBNF Dialect Definition, a recently published DSL for notation specification, and base our approach on human-specified indications that guide the subsequent automated heuristic-based recovery process. Two separate scenarios are considered in the paper: a reliable syntactic notation and an unreliable one, with the latter being remarkably more difficult to handle, but also substantially more useful since it is so often encountered in practice. The proposed approach has been verified by two prototypes that were capable of extracting dozens of grammars written in 42 different syntactic notations.
Additional Metadata
Keywords grammar recovery, grammarware, syntactic notation
ACM Formal Definitions and Theory (acm D.3.1)
MSC Grammars and rewriting systems (msc 68Q42)
THEME Software (theme 1)
Publisher Institute of Cybernetics at Tallinn University of Technology
Editor A.M. Sloane , S. Andova
Project GrammarLab: Foundations of a Grammar Laboratory
Conference Workshop on Language Descriptions, Tools and Applications
Grant This work was funded by the The Netherlands Organisation for Scientific Research (NWO); grant id nwo/612.001.007 - GrammarLab: Foundations of a Grammar Laboratory
Zaytsev, V. (2012). Notation-Parametric Grammar Recovery. In A.M Sloane & S Andova (Eds.), Pre-proceedings of the 12th International Workshop on Language Descriptions, Tools, and Applications (LDTA 2012) (pp. 105–118). Institute of Cybernetics at Tallinn University of Technology.