In the classic Longest Common Substring (LCS) problem, we are given two strings $S$ and $T$ of total length $n$, over an alphabet of size $\sigma$, and we are asked to find a longest string occurring as a fragment of both $S$ and $T$. Weiner, in his seminal paper that introduced the suffix tree, presented an $\mathcal{O}(n \log \sigma)$-time algorithm for this problem [SWAT 1973]. For polynomially-bounded integer alphabets, the linear-time construction of suffix trees by Farach yielded an $\mathcal{O}(n)$-time algorithm for the LCS problem [FOCS 1997]. However, for small alphabets, this is not necessarily optimal for the LCS problem in the word RAM model of computation, in which the strings can be stored in $\mathcal{O}(n \log \sigma/\log n )$ space and read in $\mathcal{O}(n \log \sigma/\log n )$ time. We show that we can compute an LCS of two strings in time $\mathcal{O}(n \log \sigma / \sqrt{\log n})$ in the word RAM model, which is sublinear in $n$ if $\sigma=2^{o(\sqrt{\log n})}$ (in particular, if $\sigma=\mathcal{O}(1)$), using optimal space $\mathcal{O}(n \log \sigma/\log n)$. In fact, it was recently shown that this result is conditionally optimal [Kempa and Kociumaka, STOC 2025]. The same complexity can be achieved for computing an LCS of $\lambda = \mathcal{O}(\sqrt{\log n}) / \log {\log n})$ input strings of total length $n$. We then lift our ideas to the problem of computing a $k$-mismatch LCS, which has received considerable attention in recent years. In this problem, the aim is to compute a longest substring of $S$ that occurs in $T$ with at most $k$ mismatches. Flouri et al. showed how to compute a 1-mismatch LCS in $\mathcal{O}(n \log n)$ time [IPL 2015]. Thankachan et al. showed how to computing a $k$-mismatch LCS in $\mathcal{O}(n \log^k n)$ time for $k=\mathcal{O}(1)$ [J. Comput. Biol. 2016]. We show an $\mathcal{O}(n \log^{k-0.5} n)$-time algorithm, for any constant $k>0$ and irrespective of the alphabet size, using $\mathcal{O}(n)$ space as the previous approaches. We thus notably break through the well-known $n \log^k n$ barrier, which stems from a recursive heavy-path decomposition technique that was first introduced in the seminal paper of Cole et al. [STOC 2004] for string indexing with $k$ errors [STOC 2004].

, ,
doi.org/10.1145/3774754
ACM Transactions on Algorithms
Pan-genome Graph Algorithms and Data Integration , Algorithms for PAngenome Computational Analysis
creativecommons.org/licenses/by/4.0/
,
Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands

Charalampopoulos, P., Kociumaka, T., Radoszewski, J., & Pissis, S. (2026). Faster algorithms for Longest Common Substring. ACM Transactions on Algorithms, 22(2), 19:1–19:47. doi:10.1145/3774754