Background. We study the problem of mapping proteins between two protein families in the presence of paralogs. This problem occurs as a difficult subproblem in coevolution-based computational approaches for protein-protein interaction prediction. Results. Similar to prior approaches, our method is based on the idea that coevolution implies equal rates of sequence evolution among the interacting proteins, and we provide a first attempt to quantify this notion in a formal statistical manner. We call the units that are central to this quantification scheme the units of coevolution. A unit consists of two mapped protein pairs and its score quantifies the coevolution of the pairs. This quantification allows us to provide a maximum likelihood formulation of the paralog mapping problem and to cast it into a binary quadratic programming formulation. Conclusion. CUPID, our software tool based on a Lagrangian relaxation of this formulation, makes it, for the first time, possible to compute state-of-the-art quality pairings in a few minutes of runtime. In summary, we suggest a novel alternative to the earlier available approaches, which is statistically sound and computationally feasible.

Additional Metadata
THEME Life Sciences (theme 5)
Publisher BioMed Central
Journal BMC Bioinformatics
Conference RECOMB Comparative Genomics
El-Kebir, M, Marschall, T, Wohlers, I, Patterson, M.D, Heringa, J, Schönhuth, A, & Klau, G.W. (2013). Mapping proteins in the presence of paralogs using units of coevolution. BMC Bioinformatics.