Efficient computation of sequence mappability

Charalampopoulos, Panagiotis; Iliopoulos, Costas; Kociumaka, Tomasz; Pissis, Solon; Radoszewski, Jakub; Straszyński, Juliusz

doi:10.1007/s00453-022-00934-y

P. Charalampopoulos (Panagiotis), C.S. Iliopoulos (Costas), T. Kociumaka (Tomasz), S. Pissis (Solon), J. Radoszewski (Jakub) and J. Straszyński (Juliusz)

2022-02-02

Efficient computation of sequence mappability

Algorithmica , Volume 84 p. 1418- 1440

Sequence mappability is an important task in genome resequencing. In the (k, m)-mappability problem, for a given sequence T of length n, the goal is to compute a table whose ith entry is the number of indices j≠ i such that the length-m substrings of T starting at positions i and j have at most k mismatches. Previous works on this problem focused on heuristics computing a rough approximation of the result or on the case of k=1. We present several efficient algorithms for the general case of the problem. Our main result is an algorithm that, for k=O(1) , works in O(n) space and, with high probability, in O(n · min{mk, log kn}) time. Our algorithm requires a careful adaptation of the k-errata trees of Cole et al. [STOC 2004] to avoid multiple counting of pairs of substrings. Our technique can also be applied to solve the all-pairs Hamming distance problem introduced by Crochemore et al. [WABI 2017]. We further develop O(n2) -time algorithms to compute all (k, m)-mappability tables for a fixed m and all k ∈ {0 , … , m} or a fixed k and all m ∈ {k, … , n}. Finally, we show that, for k, m = Θ (log n), the (k, m)-mappability problem cannot be solved in strongly subquadratic time unless the Strong Exponential Time Hypothesis fails. This is an improved and extended version of a paper presented at SPIRE 2018.

Additional Metadata
Keywords	Sequence mappability, k-errata tree, Hamming distance
Persistent URL	doi.org/10.1007/s00453-022-00934-y
Journal	Algorithmica
Project	Pan-genome Graph Algorithms and Data Integration , Algorithms for PAngenome Computational Analysis
Grant	This work was funded by the European Commission 7th Framework Programme; grant id h2020/872539 - Pan-genome Graph Algorithms and Data Integration (PANGAIA), This work was funded by the European Commission 7th Framework Programme; grant id h2020/956229 - Algorithms for PAngenome Computational Analysis (ALPACA)
Organisation	Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands
Citation APA APA Style APA-ALL Style AAA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Charalampopoulos, P., Iliopoulos, C., Kociumaka, T., Pissis, S., Radoszewski, J.& Straszyński, J. (2022). Efficient computation of sequence mappability. Algorithmica, 84, 1418–1440.https://doi.org/10.1007/s00453-022-00934-y

View at Publisher

Free Full Text ( Final Version , 532kb )

Efficient computation of sequence mappability

Publication

Publication

Address

CWI researchers

Questions or comments?

Efficient computation of sequence mappability

Publication

Publication

Workflow

Workflow

Add Content