2024-03-11
Approximate Circular Pattern Matching under edit distance
Publication
Publication
In the k-Edit Circular Pattern Matching (k-Edit CPM) problem, we are given a length-n text T, a length-m pattern P, and a positive integer threshold k, and we are to report all starting positions of the substrings of T that are at edit distance at most k from some cyclic rotation of P. In the decision version of the problem, we are to check if any such substring exists. Very recently, Charalampopoulos et al. [ESA 2022] presented O(nk2)-time and O(nk log3 k)-time solutions for the reporting and decision versions of k-Edit CPM, respectively. Here, we show that the reporting and decision versions of k-Edit CPM can be solved in O(n + (n/m) k6) time and O(n + (n/m) k5 log3 k) time, respectively, thus obtaining the first algorithms with a complexity of the type O(n + (n/m) poly(k)) for this problem. Notably, our algorithms run in O(n) time when m = Ω(k6) and are superior to the previous respective solutions when m = ω(k4). We provide a meta-algorithm that yields efficient algorithms in several other interesting settings, such as when the strings are given in a compressed form (as straight-line programs), when the strings are dynamic, or when we have a quantum computer. We obtain our solutions by exploiting the structure of approximate circular occurrences of P in T, when T is relatively short w.r.t. P. Roughly speaking, either the starting positions of approximate occurrences of rotations of P form O(k4) intervals that can be computed efficiently, or some rotation of P is almost periodic (is at a small edit distance from a string with small period). Dealing with the almost periodic case is the most technically demanding part of this work; we tackle it using properties of locked fragments (originating from [Cole and Hariharan, SICOMP 2002]).
Additional Metadata | |
---|---|
, , | |
doi.org/10.4230/LIPIcs.STACS.2024.24 | |
Leibniz International Proceedings in Informatics (LIPIcs) , Proceedings of the 41st International Symposium on Theoretical Aspects of Computer Science, STACS 2024 | |
Pan-genome Graph Algorithms and Data Integration , Algorithms for PAngenome Computational Analysis , Marie Skłodowska-Curie | |
41st International Symposium on Theoretical Aspects of Computer Science, STACS 2024 | |
, | |
Organisation | Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands |
Charalampopoulos, P., Pissis, S., Radoszewski, J., Rytter, W., Waleń, T., & Zuba, W. (2024). Approximate Circular Pattern Matching under edit distance. In Proceedings of International Symposium on Theoretical Aspects of Computer Science. doi:10.4230/LIPIcs.STACS.2024.24 |