2026-06-27
Minimizing the minimizers via alphabet reordering
Publication
Publication
Theoretical Computer Science , Volume 1076 p. 115932:1- 115932:13
Minimizer sampling is one of the most widely-used mechanisms for sampling strings [Schleimer et al., SIGMOD 2003; Roberts et al., Bioinformatics 2004]. Let S=S[1]…S[n] be a string over a totally ordered alphabet Σ. Further let w ≥ 2 and k ≥ 1 be two integers. The minimizer of S[i.i+w+k−2] is the smallest position in [i,i+w−1] where the lexicographically smallest length-k substring of S[i.i+w+k−2] starts. The set of minimizers over all i∈[1,n−w−k+2] is the set Mw,k(S) of the minimizers of S. We consider the following basic problem: Given S, w, and k, can we efficiently compute a total order on Σ that minimizes |Mw,k(S)|?We show that this is unlikely by proving that the problem is NP-hard for any w ≥ 2 and k ≥ 1. Our result provides theoretical justification as to why there exist no exact algorithms for minimizing the minimizer samples, while there exists a plethora of heuristics for the same purpose.
| Additional Metadata | |
|---|---|
| , , , , | |
| doi.org/10.1016/j.tcs.2026.115932 | |
| Theoretical Computer Science | |
| Pan-genome Graph Algorithms and Data Integration , Algorithms for PAngenome Computational Analysis , Constance van Eeden Fellowship | |
| , , | |
| Organisation | Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands |
|
Verbeek, H., Ayad, L., Loukides, G., & Pissis, S. (2026). Minimizing the minimizers via alphabet reordering. Theoretical Computer Science, 1076, 115932:1–115932:13. doi:10.1016/j.tcs.2026.115932 |
|