Faster algorithms for Longest Common Substring

Charalampopoulos, Panagiotis; Kociumaka, Tomasz; Radoszewski, Jakub; Pissis, Solon

doi:10.1145/3774754

P. Charalampopoulos (Panagiotis), T. Kociumaka (Tomasz), J. Radoszewski (Jakub) and S. Pissis (Solon)

2026-02-06

Faster algorithms for Longest Common Substring

ACM Transactions on Algorithms , Volume 22 - Issue 2 p. 19:1- 19:47

In the classic Longest Common Substring (LCS) problem, we are given two strings $S$ and $T$ of total length $n$, over an alphabet of size $\sigma$, and we are asked to find a longest string occurring as a fragment of both $S$ and $T$. Weiner, in his seminal paper that introduced the suffix tree, presented an $\mathcal{O}(n \log \sigma)$-time algorithm for this problem [SWAT 1973]. For polynomially-bounded integer alphabets, the linear-time construction of suffix trees by Farach yielded an $\mathcal{O}(n)$-time algorithm for the LCS problem [FOCS 1997]. However, for small alphabets, this is not necessarily optimal for the LCS problem in the word RAM model of computation, in which the strings can be stored in $\mathcal{O}(n \log \sigma/\log n )$ space and read in $\mathcal{O}(n \log \sigma/\log n )$ time. We show that we can compute an LCS of two strings in time $\mathcal{O}(n \log \sigma / \sqrt{\log n})$ in the word RAM model, which is sublinear in $n$ if $\sigma=2^{o(\sqrt{\log n})}$ (in particular, if $\sigma=\mathcal{O}(1)$), using optimal space $\mathcal{O}(n \log \sigma/\log n)$. In fact, it was recently shown that this result is conditionally optimal [Kempa and Kociumaka, STOC 2025]. The same complexity can be achieved for computing an LCS of $\lambda = \mathcal{O}(\sqrt{\log n}) / \log {\log n})$ input strings of total length $n$. We then lift our ideas to the problem of computing a $k$-mismatch LCS, which has received considerable attention in recent years. In this problem, the aim is to compute a longest substring of $S$ that occurs in $T$ with at most $k$ mismatches. Flouri et al. showed how to compute a 1-mismatch LCS in $\mathcal{O}(n \log n)$ time [IPL 2015]. Thankachan et al. showed how to computing a $k$-mismatch LCS in $\mathcal{O}(n \log^k n)$ time for $k=\mathcal{O}(1)$ [J. Comput. Biol. 2016]. We show an $\mathcal{O}(n \log^{k-0.5} n)$-time algorithm, for any constant $k>0$ and irrespective of the alphabet size, using $\mathcal{O}(n)$ space as the previous approaches. We thus notably break through the well-known $n \log^k n$ barrier, which stems from a recursive heavy-path decomposition technique that was first introduced in the seminal paper of Cole et al. [STOC 2004] for string indexing with $k$ errors [STOC 2004].

Additional Metadata
Keywords	k mismatches, Longest Common Substring, Wavelet tree
Persistent URL	doi.org/10.1145/3774754
Journal	ACM Transactions on Algorithms
Project	Pan-genome Graph Algorithms and Data Integration , Algorithms for PAngenome Computational Analysis
Rights	creativecommons.org/licenses/by/4.0/
Grant	This work was funded by the European Commission 7th Framework Programme; grant id h2020/872539 - Pan-genome Graph Algorithms and Data Integration (PANGAIA), This work was funded by the European Commission 7th Framework Programme; grant id h2020/956229 - Algorithms for PAngenome Computational Analysis (ALPACA)
Organisation	Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands
Citation APA APA Style APA-ALL Style AAA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Charalampopoulos, P., Kociumaka, T., Radoszewski, J.& Pissis, S. (2026). Faster algorithms for Longest Common Substring. ACM Transactions on Algorithms, 22(2), 19:1–19:47.https://doi.org/10.1145/3774754

Free Full Text ( Final Version , 3mb )

See Also
inProceedings Faster algorithms for longest common substring P. Charalampopoulos (Panagiotis), T. Kociumaka (Tomasz), S. Pissis (Solon) and J. Radoszewski (Jakub)

Faster algorithms for Longest Common Substring

Publication

Publication

inProceedings
Faster algorithms for longest common substring

Address

CWI researchers

Questions or comments?

Faster algorithms for Longest Common Substring

Publication

Publication

inProceedings Faster algorithms for longest common substring

Workflow

Workflow

Add Content

inProceedings
Faster algorithms for longest common substring