The goal of software renovation is to modernize software. One way to achieve this is to first reverse engineer the essential concepts and abstractions used in the software from source code and then use these during renovation. Scaling reverse engineering to large software systems requires automated analysis. Automation often comes at the cost of over-approximation or under-approximation. This thesis explores the limits of and opportunities for these approximations via three research questions.

First, we have explored the limits of domain model recovery by manually recovering domain models. Comparing these models to a manually constructed reference domain model we found that most domain information could be recovered -- with high quality -- from the source code. Second, we have explored using both Cyclomatic Complexity (CC) and Source Lines of Code (SLOC) for automating reverse engineering. Almost all of the literature claims a strong linear correlation between these two metrics. This is often interpreted as indication that CC and SLOC are redundant to each other. In two large corpora we did not observe a strong correlation. We interpret this as a lack of evidence for CC being redundant to SLOC.

Finally, we have explored the limits of statically analyzing Java’s Reflection API. Analyzing a representative corpus revealed that 78% of all projects use Reflection. After identifying the common assumptions and limitations of relevant static analysis tools we found them widely challenged in the corpus. We propose new opportunities for static analysis tools.

P. Klint (Paul) , J.J. Vinju (Jurgen)
Universiteit van Amsterdam
hdl.handle.net/11245.1/d7139e2b-7581-4ef8-af89-0a1df03d492e
IPA dissertation series
Software Analysis and Transformation

Landman, D. (2017, October 5). Reverse engineering source code : empirical studies of limitations and opportunities. IPA dissertation series. Retrieved from http://hdl.handle.net/11245.1/d7139e2b-7581-4ef8-af89-0a1df03d492e