A string cover $C$ of a set of strings $S$ is a set of substrings from $S$ such that every string in $S$ can be written as a concatenation of the strings in $C$. Given costs assigned to each substring from $S$, the \textsc{Minimum String Cover} (MSC) problem asks for a cover of minimum total cost. This NP-hard problem has so far only been approached from a purely theoretical perspective. A~previous integer linear programming (ILP) formulation was designed for a special case, in which each string in $S$ must be generated by a (small) constant number of substrings. If this restriction is removed, the ILP has an exponential number of variables, for which we show the pricing problem to be NP-hard. We propose an alternative flow-based ILP formulation of polynomial size, whose structure is particularly favorable for a Lagrangian relaxation approach. By making use of the strong bounds obtained through a repeated shortest path computation in a branch-and-bound manner, we show for the first time that non-trivial MSC instances can be solved to provable optimality in reasonable time. We also provide and solve real-world instances derived from the classic text ``Alice in Wonderland''. On almost all instances, our Lagrangian relaxation approach outperforms a CPLEX-based implementation by an order of magnitude. Our software is available under the terms of the GNU general public license.

, ,
Workshop on Algorithm Engineering and Experiments
Life Sciences and Health

Canzar, S, Marschall, T, Rahmann, S, & Schwiegelshohn, C. (2012). Solving the Minimum String Cover Problem. In Proceedings of Workshop on Algorithm Engineering and Experiments 2012. ACM.