Ambiguity Detection Methods for Context-Free Grammars
The Meta-Environment enables the creation of grammars using the SDF formalism. From these grammars an SGLR parser can be generated. One of the advantages of these parsers is that they can handle the entire class of context-free grammars (CFGs). The grammar developer does not have to squeeze his grammar into a specific subclass of CFGs that is deterministically parsable. Instead, he can now design his grammar to best describe the structure of his language. The downside of allowing the entire class of CFGs is the danger of ambiguities. An ambiguous grammar prevents some sentences from having a unique meaning, depending on the semantics of the used language. It is best to remove all ambiguities from a grammar before it is used. Unfortunately, the detection of ambiguities in a grammar is an undecidable problem. For a recursive grammar the number of possibilities that have to be checked might be infinite. Various ambiguity detection methods (ADMs) exist, but none can always correctly identify the (un)ambiguity of a grammar. They all try to attack the problem from different angles, which results in different characteristics like termination, accuracy and performance. The goal of this project was to find out which method has the best practical usability. In particular, we investigated their usability in common use cases of the Meta-Environment, which we try to represent with a collection of about 120 grammars with different numbers of ambiguity. We distinguish three categories: small (less than 17 production rules), medium (below 200 production rules) and large (between 200 and 500 production rules). On these grammars we have benchmarked three implementations of ADMs: AMBER (a derivation generator), MSTA (a parse table generator used as the LR(k) test) and a modified Bison tool which implements the ADM of Schmitz. We have measured their accuracy, performance and termination on the grammar collections. From the results we analyzed their scalability (the scale with which accuracy can be traded for performance) and their practical usability. The conclusion of this project is that AMBER was the most practically usable on our grammars. If it terminates, which it did on most of our grammars, then all its other characteristics are very helpful. The LR(1) precision of Schmitz was also pretty useable on the medium grammars, but needed too much memory on the large ones. Its downside is that its reports are hard to comprehend and verify. The insights gained during this project have led to the development of a new hybrid ADM. It uses Schmitz' method to filter out parts of a grammar that are guaranteed to be unambiguous. The remainder of the grammar is then tested with a derivation generator, which might find ambiguities in less time. We have built a small prototype which was indeed faster than AMBER on the tested grammars, making it the most usable ADM of all.