A degenerate symbol over an alphabet Σ is a non-empty subset of Σ, and a sequence of such symbols is a degenerate string. We investigate the exact computation of maximal degenerate palindromes with gaps and mismatches. We present an algorithm which, given a degenerate string of length n and natural number parameters g and m, efficiently detects exact maximal palindromes with a gap size ≤g, and ≤m permitted mismatches. We show that it can be done in O(k|Σ|(k+log⁡|Σ|)+(k+g+m)n) time and O((g+m)n) space, where k represents an upper bound on the number of degenerate symbols contained in the string. Furthermore, we also show that the problem of factorisation a string into maximal degenerate palindromes with gaps and mismatches can also be done in O(k|Σ|(k+log⁡|Σ|)+(k+g+m)n) time and O((g+m)n) space. An inverted repeat is a specific type of palindrome which refers to a nucleotide sequence followed by its reverse complement. Our results can also be used to find maximal inverted repeated sequences with gaps and mismatches, where changing the structure of palindromes to inverted repeats does not affect the overall running time. Finally we demonstrate our algorithm on several strains of SARS-CoV-2, and quantify the number of inverted repeats found with ≤0,1,2 mismatches and ≤0,10,100 gap size.

, , , ,
doi.org/10.1016/j.tcs.2023.114182
Theoretical Computer Science
Networks and Optimization

Alzamel, M., Hampson, C., Iliopoulos, C., Lim, Z., Pissis, S., Vlachakis, D., & Watts, S. (2023). Maximal degenerate palindromes with gaps and mismatches. Theoretical Computer Science, 978, 114182:1–114182:16. doi:10.1016/j.tcs.2023.114182