Weighted Shortest Common Supersequence problem revisited
A weighted string, also known as a position weight matrix, is a sequence of probability distributions over some alphabet. We revisit the Weighted Shortest Common Supersequence (WSCS) problem, introduced by Amir et al. [SPIRE 2011], that is, the SCS problem on weighted strings. In the WSCS problem, we are given two weighted strings (Formula presented) and (Formula presented) and a threshold (Formula presented) on probability, and we are asked to compute the shortest (standard) string S such that both (Formula presented) and (Formula presented) match subsequences of S (not necessarily the same) with probability at least (Formula presented). Amir et al. showed that this problem is NP-complete if the probabilities, including the threshold (Formula presented), are represented by their logarithms (encoded in binary). We present an algorithm that solves the WSCS problem for two weighted strings of length n over a constant-sized alphabet in (Formula presented) time. Notably, our upper bound matches known conditional lower bounds stating that the WSCS problem cannot be solved in (Formula presented) time or in (Formula presented) with time, where the (Formula presented) notation suppresses factors polynomial with respect to the instance size (with numeric values encoded in binary), unless there is a breakthrough improving upon long-standing upper bounds for fundamental NP-hard problems (CNF-SAT and Subset Sum, respectively). We also discover a fundamental difference between the WSCS problem and the Weighted Longest Common Subsequence (WLCS) problem, introduced by Amir et al. [JDA 2010]. We show that the WLCS problem cannot be solved in (Formula presented) time, for any function f(z), unless (Formula presented).
|Series||Lecture Notes in Computer Science|
|Conference||International Symposium on String Processing and Information Retrieval|
Charalampopoulos, P, Kociumaka, T, Pissis, S, Radoszewski, J, Rytter, W, Straszyński, J, … Zuba, W. (2019). Weighted Shortest Common Supersequence problem revisited. In Proceedings of the International Symposium on String Processing and Information Retrieval (pp. 221–238). doi:10.1007/978-3-030-32686-9_16