Automation of grammar recovery is an important research area that received attention over the last decade and a half. Given the abundance of available documentation for software languages that is only going to keep increasing in the future, there is need for reliable extraction techniques that allow grammar engineers to derive useful information from it. This information can be further used to build grammarware, like parsers or test generators, or to perform grammar investigation. Grammars obtained systematically from existing sources always have preference over manually constructed ones due to traceability of their issues, including errors and design weaknesses. This paper focuses on automated grammar recovery from sources that utilise a family of metasyntaxes known as EBNF: many language specifications extend the well-studied Backus Naur Form in different directions, resulting in unnecessary diversity of syntactic notations. To enable manipulation of EBNF families, we use EDD, the EBNF Dialect Definition, a recently published DSL for notation specification, and base our approach on human-specified indications that guide the subsequent automated heuristic-based recovery process. Two separate scenarios are considered in the paper: a reliable syntactic notation and an unreliable one, with the latter being remarkably more difficult to handle, but also substantially more useful since it is so often encountered in practice. The proposed approach has been verified by two prototypes that were capable of extracting dozens of grammars written in 42 different syntactic notations.
, ,
A.M. Sloane , S. Andova
GrammarLab: Foundations of a Grammar Laboratory
Workshop on Language Descriptions, Tools and Applications
Software Analysis and Transformation

Zaytsev, V. (2012). Notation-Parametric Grammar Recovery. In A.M Sloane & S Andova (Eds.), Post-proceedings of the 12th International Workshop on Language Descriptions, Tools and Applications. ACM.