2025-05-01
Elastic-degenerate string comparison
Publication
Publication
Information and Computation , Volume 304 p. 105296:1- 105296:23
An elastic-degenerate (ED) string T is a sequence of n sets T[1],…,T[n] containing m strings in total whose cumulative length is N. We call n, m, and N the length, the cardinality and the size of T, respectively. The language of T is defined as L(T)={S1⋯Sn:Si∈T[i] for all i∈[1,n]}. Given two ED strings, how fast can we check whether the two languages they represent have a nonempty intersection? We call this problem the ED STRING INTERSECTION (EDSI) problem. For two ED strings T1 and T2 of lengths n1 and n2, cardinalities m1 and m2, and sizes N1 and N2, respectively, we show the following: • There is no O((N1N2)1−ϵ)-time algorithm, for any ϵ>0, for EDSI even if T1 and T2 are over a binary alphabet, unless the Strong Exponential-Time Hypothesis is false. • There is no combinatorial O((N1+N2)1.2−ϵf(n1,n2))-time algorithm, for any ϵ>0 and any function f, for EDSI even if T1 and T2 are over a binary alphabet, unless the Boolean Matrix Multiplication conjecture is false. • An O(N1logN1logn1+N2logN2logn2)-time algorithm for outputting a compact representation of the intersection language of two unary ED strings. When T1 and T2 are given in a compact representation, we show that the problem is NP-complete. • An O(N1m2+N2m1)-time algorithm for EDSI. • An O˜(N1ω−1n2+N2ω−1n1)-time algorithm for EDSI, where ω is the matrix multiplication exponent; the O˜ notation suppresses factors that are polylogarithmic in the input size.
Additional Metadata | |
---|---|
, , , , | |
doi.org/10.1016/j.ic.2025.105296 | |
Information and Computation | |
Pan-genome Graph Algorithms and Data Integration , Algorithms for PAngenome Computational Analysis , Networks , Marie Skłodowska-Curie | |
, , | |
Organisation | Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands |
Gabory, E., Mwaniki, M. N., Pisanti, N., Pissis, S., Radoszewski, J., Sweering, M., & Zuba, W. (2025). Elastic-degenerate string comparison. Information and Computation, 304, 105296:1–105296:23. doi:10.1016/j.ic.2025.105296 |