An elastic-degenerate (ED) string $\mathrm{T}$ is a sequence $\mathrm{T} = \mathrm{T}[1] · · · \mathrm{T}[n]$ of $n$ finite sets of strings. The cardinality $m$ of $\mathrm{T}$ is the total number of strings in $\mathrm{T}[i]$, for all $i ∈ [1 . . n]$. The size $\mathrm{N}$ of $\mathrm{T}$ is the total length of all $m$ strings of $\mathrm{T}$. ED strings have been introduced to represent a set of closely-related DNA sequences. Let $P = P[1 . . p]$ be a pattern of length $p$ and $k > 0$ be an integer. We consider the problem of $k$-Approximate ED String Matching (EDSM): searching $k$-approximate occurrences of $P$ in the language of $\mathrm{T}$. We call $k$-Approximate EDSM under the Hamming distance, $k$-Mismatch EDSM; and we call $k$-Approximate EDSM under edit distance, $k$-Edit EDSM. Bernardini et al. (Theoretical Computer Science, 2020) showed a simple $\mathcal{O}(kmp + kN)$-time algorithm for $k$-Mismatch EDSM and an $\mathcal{O}(k2mp + kN)$-time algorithm for $k$-Edit EDSM. We improve the dependency on $k$ in both results, obtaining an $\mathcal{Õ}(k2/3mp +√kN)$-time algorithm for $k$-Mismatch EDSM and an $\mathcal{Õ}(kmp + kN)$-time algorithm for $k$-Edit EDSM. Bernardini et al. (Theory of Computing Systems, 2024) presented several algorithms for 1-Approximate EDSM working in $\mathcal{Õ}(np2 + N)$ time. They have also left the possibility to generalize these solutions for $k > 1$ as an open problem. We improve the runtime of their solution for 1-Mismatch and 1-Edit EDSM from $\mathcal{Õ}(np2 + N)$ to $\mathcal{O}(np2 + N)$. We further show algorithms for $k$-Approximate EDSM for the Hamming and edit distances working in $\mathcal{Õ}(np2 + N)$ time, for any constant $k > 0$. Finally, we show how our techniques can be applied to improve upon the complexity of the $k$-Approximate ED String Intersection and $k$-Approximate Doubly EDSM problems that were introduced very recently by Gabory et al. (Information and Computation, 2025).

, , ,
doi.org/10.4230/LIPIcs.CPM.2025.28
36th Annual Symposium on Combinatorial Pattern Matching, CPM 2025
Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands

Pissis, S., Radoszewski, J., & Zuba, W. (2025). Faster approximate elastic-degenerate string matching – Part A. In Leibniz International Proceedings in Informatics (LIPIcs) (pp. 28:1–28:19). doi:10.4230/LIPIcs.CPM.2025.28