Repository mining research is a data-intensive domain with a focus on source code. There are many ways to search for code in the worldwide software ecosystem, but these search methods are inefficient and only cover small parts of the software ecosystem. One of the problems is granularity: it is possible to search through code on a file-level and cover a significant part of the software ecosystem or search for a line of code and only cover a small part of the software ecosystem, but not both. We propose SearchSECO: a language-agnostic search engine and research platform that searches through abstract representations of source code methods. We use SearchSECO to search across the worldwide software ecosystem and index the encountered methods. With SearchSECO, the field is advanced because it (1) provides finer-grained and more efficient searches, (2) covers more of the software ecosystem than other search mechanisms, and (3) provides mechanisms for source code provenance.

Software Improvement Group
CEUR Workshop Proceedings
19th Belgium-Netherlands Software Evolution Workshop, BENEVOL 2020
Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands

Jansen, S., Farshidi, S., Gousios, G., van der Storm, T., Visser, J., & Bruntink, M. (2020). SearchSECO: A worldwide index of the open source software ecosystem. In Proceedings of the Belgium-Netherlands Software Evolution Workshop.