Efficient data structures for range shortest unique substring queries†

Abedin, Paniz; Ganguly, Arnab; Pissis, Solon; Thankachan, Sharma

doi:10.3390/a13110276

P. Abedin (Paniz), A. Ganguly (Arnab), S. Pissis (Solon) and S.V. Thankachan (Sharma)

2020-11-01

Efficient data structures for range shortest unique substring queries†

Algorithms , Volume 13 - Issue 11 p. 1- 9

Let T[1, n] be a string of length n and T[i, j] be the substring of T starting at position i and ending at position j. A substring T[i, j] of T is a repeat if it occurs more than once in T; otherwise, it is a unique substring of T. Repeats and unique substrings are of great interest in computational biology and information retrieval. Given string T as input, the Shortest Unique Substring problem is to find a shortest substring of T that does not occur elsewhere in T. In this paper, we introduce the range variant of this problem, which we call the Range Shortest Unique Substring problem. The task is to construct a data structure over T answering the following type of online queries efficiently. Given a range [α, β], return a shortest substring T[i, j] of T with exactly one occurrence in [α, β]. We present an O(n log n)-word data structure with O(logw n) query time, where w = Ω(log n) is the word size. Our construction is based on a non-trivial reduction allowing for us to apply a recently introduced optimal geometric data structure [Chan et al., ICALP 2018]. Additionally, we present an O(n)-word data structure with O(√ n logɛ n) query time, where ɛ > 0 is an arbitrarily small constant. The latter data structure relies heavily on another geometric data structure [Nekrich and Navarro, SWAT 2012].

Additional Metadata
Keywords	Geometric data structures, Heavy-light decomposition, Range queries, Shortest unique substring, Suffix tree
Persistent URL	doi.org/10.3390/a13110276
Journal	Algorithms
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Abedin, P., Ganguly, A., Pissis, S., & Thankachan, S. (2020). Efficient data structures for range shortest unique substring queries†. Algorithms, 13(11), 1–9. doi:10.3390/a13110276

View at Publisher

Full Text ( Final Version , 920kb )

Efficient data structures for range shortest unique substring queries†

Publication

Publication

Address

CWI researchers

Questions or comments?

Efficient data structures for range shortest unique substring queries†

Publication

Publication

Workflow

Workflow

Add Content