Source model extraction---the automated extraction of information from system artifacts---is a common phase in reverse engineering tools. One of the major challenges of this phase is creating extractors that can deal with irregularities in the artifacts that are typical for the reverse engineering domain (for example, syntactic errors, incomplete source code, language dialects and embedded languages). This paper proposes a solution in the form of emph{island grammars, a special kind of grammars that combine the detailed specification possibilities of grammars with the liberal behavior of lexical approaches. We show how island grammars can be used to generate robust parsers that combine the accuracy of syntactical analysis with the speed, flexibility and tolerance usually only found in lexical analysis. We conclude with a discussion of the development of Mangrove, a generator for source model extractors based on island grammars and describe its application to a number of case studies.

Automatic Programming (acm D.1.2), Requirements/Specifications (acm D.2.1), Languages (acm D.2.1.1), Tools (acm D.2.1.3), Formal Definitions and Theory (acm D.3.1), Language Classifications (acm D.3.2), Processors (acm D.3.4), Grammars and Other Rewriting Systems (acm F.4.2)
Software (theme 1)
CWI
Software Engineering [SEN]
Software Analysis and Transformation

Moonen, L.M.F. (2001). Generating robust parsers using island grammars. Software Engineering [SEN]. CWI.