A string cover $C$ of a set of strings $S$ is a set of substrings from $S$ such that every string in $S$ can be written as a concatenation of the strings in $C$. Given costs assigned to each substring from $S$, the \textsc{Minimum String Cover} (MSC) problem asks for a cover of minimum total cost. This NP-hard problem has so far only been approached from a purely theoretical perspective. A~previous integer linear programming (ILP) formulation was designed for a special case, in which each string in $S$ must be generated by a (small) constant number of substrings. If this restriction is removed, the ILP has an exponential number of variables, for which we show the pricing problem to be NP-hard. We propose an alternative flow-based ILP formulation of polynomial size, whose structure is particularly favorable for a Lagrangian relaxation approach. By making use of the strong bounds obtained through a repeated shortest path computation in a branch-and-bound manner, we show for the first time that non-trivial MSC instances can be solved to provable optimality in reasonable time. We also provide and solve real-world instances derived from the classic text ``Alice in Wonderland''. On almost all instances, our Lagrangian relaxation approach outperforms a CPLEX-based implementation by an order of magnitude. Our software is available under the terms of the GNU general public license.

string cover, lagrange relaxation, integer linear program
Life Sciences (theme 5), Energy (theme 4)
Workshop on Algorithm Engineering and Experiments
Life Sciences and Health

Canzar, S, Marschall, T, Rahmann, S, & Schwiegelshohn, C. (2012). Solving the Minimum String Cover Problem. In Proceedings of Workshop on Algorithm Engineering and Experiments 2012. ACM.