Enumerating common molecular substructures
Finding and enumerating common molecular substructures is an important task in cheminformatics, where small molecules are often modeled as molecular graphs. We introduce the problem of enumerating all maximal k-common molecular fragments of a pair of molecular graphs. A k-common fragment is a common connected induced subgraph that consists of a common core and a common k-neighborhood. It is thus a generalization of the NP-hard task to enumerate all maximal common connected induced subgraphs (MCCIS) of two graphs, which corresponds to the k = 0 case. We extend the MCCIS enumeration algorithm by Ina Koch and apply algorithm engineering techniques to solve practical instances fast for the general k > 0 case, which is relevant, for example, for automatically generating force field topologies for molecular dynamics (MD) simulations. We find that our methods achieve good performance on a real-world benchmark of all-against-all comparisons of 255 molecules. Our software is available under the LGPL open source license at https://github.com/enitram/mogl .