Let be a string of length n and be the substring of starting at position i and ending at position j. A substring of is a repeat if it occurs more than once in; otherwise, it is a unique substring of. Repeats and unique substrings are of great interest in computational biology and in information retrieval. Given string as input, the Shortest Unique Substring problem is to find a shortest substring of that does not occur elsewhere in. In this paper, we introduce the range variant of this problem, which we call the Range Shortest Unique Substring problem. The task is to construct a data structure over answering the following type of online queries efficiently. Given a range, return a shortest substring of with exactly one occurrence in. We present an -word data structure with query time, where is the word size. Our construction is based on a non-trivial reduction allowing us to apply a recently introduced optimal geometric data structure [Chan et al. ICALP 2018].

, , , ,
doi.org/10.1007/978-3-030-32686-9_18
Lecture Notes in Computer Science
International Symposium on String Processing and Information Retrieval
Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands

Abedin, P., Ganguly, A., Pissis, S., & Thankachan, S. (2019). Range Shortest Unique Substring queries. In Proceedings of the International Symposium on String Processing and Information Retrieval (pp. 258–266). doi:10.1007/978-3-030-32686-9_18