Elastic-degenerate string matching with 1 error
An elastic-degenerate (ED) string is a sequence of n finite sets of strings of total length N, introduced to represent a set of related DNA sequences, also known as a pangenome. The ED string matching (EDSM) problem consists in reporting all occurrences of a pattern of length m in an ED text. The EDSM problem has recently received some attention by the combinatorial pattern matching community, culminating in an O~(nmω−1)+O(N)-time algorithm [Bernardini et al., SIAM J. Comput. 2022], where ω denotes the matrix multiplication exponent and the O~(⋅) notation suppresses polylog factors. In the k-EDSM problem, the approximate version of EDSM, we are asked to report all pattern occurrences with at most k errors. k-EDSM can be solved in O(k2mG+kN) time under edit distance, where G denotes the total number of strings in the ED text [Bernardini et al., Theor. Comput. Sci. 2020]. Unfortunately, G is only bounded by N, and so even for k=1, the existing algorithm runs in Ω(mN) time in the worst case. Here we make progress in this direction. We show that 1-EDSM can be solved in O((nm2+N)logm) or O(nm3+N) time under edit distance. For the decision version of the problem, we present a faster O(nm2logm−−−−−√+Nloglogm)-time algorithm. Our algorithms rely on non-trivial reductions from 1-EDSM to special instances of classic computational geometry problems (2d rectangle stabbing or range emptiness), which we show how to solve efficiently.
|, , , ,|
|Lecture Notes in Computer Science|
|LATIN 2022: Theoretical Informatics|
Bernardini, G, Gabory, E.A.D, Pissis, S, Stougie, L, Sweering, M.J.M, & Zuba, W.P. (2022). Elastic-degenerate string matching with 1 error. In LATIN: Latin American Symposium on Theoretical Informatics (pp. 20–37). doi:10.1007/978-3-031-20624-5_2