Source data used in the manuscript "VeChat: Correcting errors in noisy long reads using variation graphs"

Luo, Vincent; Kang, Xiongbin; Schönhuth, Alexander

Error correction is the canonical first step in long-read sequencing data analysis. The current standard is to make use of a consensus sequence as a template. However, in mixed samples, such as metagenomes or organisms of higher ploidy, consensus induced biases can mask true variants affecting haplotypes of lower frequencies, because they are mistaken as errors. The novelty presented here is to use graph based, instead of sequence based consensus as a template for identifying errors. The advantage is that graph based reference systems also capture variants of lower frequencies, so do not mistakenly mask them as errors. We present VeChat, as a novel approach to implement this idea: VeChat distinguishes errors from haplotype-specific true variants based on variation graphs, which reflect a popular type of data structure for pangenome reference systems. Upon initial construction of an ad-hoc variation graph from the raw input reads, nodes and edges that are due to errors are pruned from that graph by way of an iterative procedure that is based on principles from frequent itemset mining. Upon termination, the graph exclusively contains nodes and edges reflecting true sequential phenomena. Final re-alignments of the raw reads indicate where and how reads need to be corrected.

Additional Metadata
Organisation	Evolutionary Intelligence
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Luo, V., Kang, X., & Schönhuth, A. (2022). Source data used in the manuscript "VeChat: Correcting errors in noisy long reads using variation graphs".

Additional Files
View at Zenodo
View at GitHub

See Also
article VeChat: correcting errors in long reads using variation graphs X. Luo (Vincent), X. Kang (Xiongbin) and A. Schönhuth (Alexander)

Source data used in the manuscript "VeChat: Correcting errors in noisy long reads using variation graphs"

Publication

Publication

article
VeChat: correcting errors in long reads using variation graphs

Address

CWI researchers

Questions or comments?

Source data used in the manuscript "VeChat: Correcting errors in noisy long reads using variation graphs"

Publication

Publication

article VeChat: correcting errors in long reads using variation graphs

Workflow

Workflow

Add Content

article
VeChat: correcting errors in long reads using variation graphs