Finding and enumerating common molecular substructures is an important task in cheminformatics, where small molecules are often modeled as molecular graphs. We introduce the problem of enumerating all maximal k-common molecular fragments of a pair of molecular graphs. A k-common fragment is a common connected induced subgraph that consists of a common core and a common k-neighborhood. It is thus a generalization of the NP-hard task to enumerate all maximal common connected induced subgraphs (MCCIS) of two graphs, which corresponds to the k = 0 case. We extend the MCCIS enumeration algorithm by Ina Koch and apply algorithm engineering techniques to solve practical instances fast for the general k > 0 case, which is relevant, for example, for automatically generating force field topologies for molecular dynamics (MD) simulations. We find that our methods achieve good performance on a real-world benchmark of all-against-all comparisons of 255 molecules. Our software is available under the LGPL open source license at .

Additional Metadata
Project Enhancing protein-drug binding prediction
Grant This work was funded by the CWI PPS samenwerking; grant id pps/NLeSC P 15.0238 - Enhancing protein-drug binding prediction
Engler, M.S, El-Kebir, M, Mulder, J.N, Mark, A.E, Geerke, D.P, & Klau, G.W. (2017). Enumerating common molecular substructures.