OGRE: Overlap Graph-based metagenomic Read clustEring

Balvert, Marleen; Luo, Vincent; Hauptfeld, Ernestina; Schönhuth, Alexander; Dutilh, Bas

doi:10.1093/bioinformatics/btaa760

M. Balvert (Marleen), X. Luo (Vincent), T. Hauptfeld (Ernestina), A. Schönhuth (Alexander) and B.E. Dutilh (Bas)

2021-04-01

OGRE: Overlap Graph-based metagenomic Read clustEring

Bioinformatics , Volume 37 - Issue 7 p. 905- 912

MOTIVATION: The microbes that live in an environment can be identified from the combined genomic material, also referred to as the metagenome. Sequencing a metagenome can result in large volumes of sequencing reads. A promising approach to reduce the size of metagenomic datasets is by clustering reads into groups based on their overlaps. Clustering reads are valuable to facilitate downstream analyses, including computationally intensive strain-aware assembly. As current read clustering approaches cannot handle the large datasets arising from high-throughput metagenome sequencing, a novel read clustering approach is needed. In this article, we propose OGRE, an Overlap Graph-based Read clustEring procedure for high-throughput sequencing data, with a focus on shotgun metagenomes. RESULTS: We show that for small datasets OGRE outperforms other read binners in terms of the number of species included in a cluster, also referred to as cluster purity, and the fraction of all reads that is placed in one of the clusters. Furthermore, OGRE is able to process metagenomic datasets that are too large for other read binners into clusters with high cluster purity. CONCLUSION: OGRE is the only method that can successfully cluster reads in species-specific clusters for large metagenomic datasets without running into computation time- or memory issues. AVAILABILITY AND IMPLEMENTATION: Code is made available on Github (https://github.com/Marleen1/OGRE). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Additional Metadata
Persistent URL	doi.org/10.1093/bioinformatics/btaa760
Journal	Bioinformatics
Project	Statistical Models for Structural Genetic Variants in the Genome of the Netherlands
Grant	This work was funded by the The Netherlands Organisation for Scientific Research (NWO); grant id nwo/639.072.309 - Statistical Models for Structural Genetic Variants in the Genome of the Netherlands
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Balvert, M., Luo, V., Hauptfeld, E., Schönhuth, A., & Dutilh, B. (2021). OGRE: Overlap Graph-based metagenomic Read clustEring. Bioinformatics, 37(7), 905–912. doi:10.1093/bioinformatics/btaa760

View at Publisher

Free Full Text ( Final Version , 643kb )

See Also
techReport OGRE: Overlap Graph-based metagenomic Read clustEring M. Balvert (Marleen), T. Hauptfeld (Ernestina), A. Schönhuth (Alexander) and B.E. Dutilh (Bas)
software\|data OGRE M. Balvert (Marleen), X. Luo (Vincent), T. Hauptfeld (Ernestina), A. Schönhuth (Alexander) and B.E. Dutilh (Bas)

OGRE: Overlap Graph-based metagenomic Read clustEring

Publication

Publication

techReport
OGRE: Overlap Graph-based metagenomic Read clustEring

software|data
OGRE

Address

CWI researchers

Questions or comments?

OGRE: Overlap Graph-based metagenomic Read clustEring

Publication

Publication

techReport OGRE: Overlap Graph-based metagenomic Read clustEring

software|data OGRE

Workflow

Workflow

Add Content

techReport
OGRE: Overlap Graph-based metagenomic Read clustEring

software|data
OGRE