Overlap graph-based generation of haplotigs for diploids and polyploids

Baaijens, Jasmijn; Schönhuth, Alexander

doi:10.1093/bioinformatics/btz255

J.A. Baaijens (Jasmijn) and A. Schönhuth (Alexander)

2019-04-17

Overlap graph-based generation of haplotigs for diploids and polyploids

Bioinformatics , Volume 35 - Issue 21 p. 4281- 4289

Motivation: Haplotype-aware genome assembly plays an important role in genetics, medicine and various other disciplines, yet generation of haplotype-resolved de novo assemblies remains a major challenge. Beyond distinguishing between errors and true sequential variants, one needs to assign the true variants to the different genome copies. Recent work has pointed out that the enormous quantities of traditional NGS read data have been greatly underexploited in terms of haplotig computation so far, which reflects that methodology for reference independent haplotig computation has not yet reached maturity.

Results: We present POLYploid genome fitTEr (POLYTE) as a new approach to de novo generation of haplotigs for diploid and polyploid genomes of known ploidy. Our method follows an iterative scheme where in each iteration reads or contigs are joined, based on their interplay in terms of an underlying haplotype-aware overlap graph. Along the iterations, contigs grow while preserving their haplotype identity. Benchmarking experiments on both real and simulated data demonstrate that POLYTE establishes new standards in terms of error-free reconstruction of haplotype-specific sequence. As a consequence, POLYTE outperforms state-of-the-art approaches in various relevant aspects, where advantages become particularly distinct in polyploid settings.

Availability and implementation: POLYTE is freely available as part of the HaploConduct package at https://github.com/HaploConduct/HaploConduct, implemented in Python and C++.

Supplementary information: Supplementary data are available at Bioinformatics online.

Additional Metadata
Persistent URL	doi.org/10.1093/bioinformatics/btz255
Journal	Bioinformatics
Project	Statistical Models for Structural Genetic Variants in the Genome of the Netherlands
Grant	This work was funded by the The Netherlands Organisation for Scientific Research (NWO); grant id nwo/639.072.309 - Statistical Models for Structural Genetic Variants in the Genome of the Netherlands
Organisation	Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Baaijens, J., & Schönhuth, A. (2019). Overlap graph-based generation of haplotigs for diploids and polyploids. Bioinformatics, 35(21), 4281–4289. doi:10.1093/bioinformatics/btz255

View at Publisher

Full Text ( Final Version , 470kb )

Overlap graph-based generation of haplotigs for diploids and polyploids

Publication

Publication

Address

CWI researchers

Questions or comments?

Overlap graph-based generation of haplotigs for diploids and polyploids

Publication

Publication

Workflow

Workflow

Add Content