Let Substrk(X) denote the set of length-k substrings of a given string X for a given integer k>0. We study the following basic string problem, called z-ShortestSk-Equivalent Strings: Given a set Sk of n length-k strings and an integer z>0, list z shortest distinct strings T1,…,Tz such that Substrk(Ti)=Sk, for all i∈[1,z]. The z-ShortestSk-Equivalent Strings problem arises naturally as an encoding problem in many real-world applications; e.g. in data privacy, data compression, and bioinformatics. The 1-ShortestSk-Equivalent Strings, referred to as ShortestSk-Equivalent String, asks for a shortest string X such that Substrk(X)=Sk. Our main contributions are as follows. Given a directed graph G=(V,E), the Directed Chinese Postman (DCP) problem asks for a shortest closed walk that visits every edge of G at least once. DCP can be solved using an algorithm for min-cost flow. We show, via a non-trivial reduction, that if ShortestSk-Equivalent String over a binary alphabet has a near-linear-time solution then so does DCP. Secondly, we show that the length of a shortest string output by ShortestSk-Equivalent String is in O(k+n2). We generalize this bound by showing that the total length of z shortest strings is in O(zk+zn2+z2n). We derive these upper bounds by showing (asymptotically tight) bounds on the total length of z shortest Eulerian walks in general directed graphs. Furthermore, we present an algorithm for solving z-ShortestSk-Equivalent Strings in O(nk+n2log2n+zn2logn+|output|) time. If z=1, the time becomes O(nk+n2log2n) by the fact that the size of the input is Θ(nk) and the size of the output is O(k+n2). Finally, we also provide a direct technical application of our algorithms on strings in an existing data privacy framework. A preliminary version of this paper was announced at CPM 2022.

, , ,
doi.org/10.1007/s00224-025-10240-z
Theory of Computing Systems
Pan-genome Graph Algorithms and Data Integration , Algorithms for PAngenome Computational Analysis
,
Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands

Bernardini, G., Conte, A., Gabory, E., Grossi, R., Loukides, G., Pissis, S., … Sweering, M. (2026). On strings having the same length-k substrings. Theory of Computing Systems, 70(2), 21:1–21:31. doi:10.1007/s00224-025-10240-z