We are interested in re-engineering families of legacy applications towards using Domain-Specific Languages (DSLs). Is it worth to invest in harvesting domain knowledge from the source code of legacy applications? Reverse engineering domain knowledge from source code is sometimes considered very hard or even impossible. Is it also difficult for "modern legacy systems"? In this paper we select two open-source applications and answer the following research questions: which parts of the domain are implemented by the application, and how much can we manually recover from the source code? To explore these questions, we compare manually recovered domain models to a reference model extracted from domain literature, and measured precision and recall. The recovered models are accurate: they cover a significant part of the reference model and they do not contain much junk. We conclude that domain knowledge is recoverable from "modern legacy" code and therefore domain model recovery can be a valuable component of a domain re-engineering process.

,
IEEE Computer Society
doi.org/10.1109/ICSM.2013.23
Domain Specific Languages: A Big Future for Small Programs
IEEE International Conference on Software Maintenance
Software Engineering

Klint, P., Landman, D., & Vinju, J. (2013). Exploring the Limits of Domain Model Recovery. In 29th IEEE International Conference on Software Maintenance (ICSM), 2013 (pp. 120–129). IEEE Computer Society. doi:10.1109/ICSM.2013.23