Sparse suffix and LCP array: Simple, direct, small, and fast

Ayad, Lorraine; Loukides, Grigorios; Pissis, Solon; Verbeek, Hilde

doi:10.1007/978-3-031-55598-5_11

L.A.K. Ayad (Lorraine), G. Loukides (Grigorios), S. Pissis (Solon) and H. Verbeek (Hilde)

2024-03-06

Sparse suffix and LCP array: Simple, direct, small, and fast

Presented at the 16th Latin American Theoratical Informatics Symposium, LATIN 2024 (March 2024), Puerto Varas, Chile

Sparse suffix sorting is the problem of sorting b=o(n) suffixes of a string of length n. Efficient sparse suffix sorting algorithms have existed for more than a decade. Despite the multitude of works and their justified claims for applications in text indexing, the existing algorithms have not been employed by practitioners. Arguably this is because there are no simple, direct, and efficient algorithms for sparse suffix array construction. We provide two new algorithms for constructing the sparse suffix and LCP arrays that are simultaneously simple, direct, small, and fast. In particular, our algorithms are: simple in the sense that they can be implemented using only basic data structures; direct in the sense that the output arrays are not a byproduct of constructing the sparse suffix tree or an LCE data structure; fast in the sense that they run in O(nlogb) time, in the worst case, or in O(n) time, when the total number of suffixes with an LCP value greater than 2⌊lognb⌋+1-1 is in O(b/logb), matching the time of optimal yet much more complicated algorithms [Gawrychowski and Kociumaka, SODA 2017; Birenzwige et al., SODA 2020]; and small in the sense that they can be implemented using only8b+o(b) machine words. We also show that our second algorithm can be trivially amended to work in O(n) time for any uniformly random string. Our algorithms are non-trivial space-efficient adaptations of the Monte Carlo algorithm by I et al. for constructing the sparse suffix tree in O(nlogb) time [STACS 2014].

Additional Metadata
Keywords	LCP array, Sparse suffix sorting, Suffix array, Suffix sorting
Persistent URL	doi.org/10.1007/978-3-031-55598-5_11
Series	Lecture Notes in Computer Science
Conference	16th Latin American Theoratical Informatics Symposium, LATIN 2024
Organisation	Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Ayad, L., Loukides, G., Pissis, S., & Verbeek, H. (2024). Sparse suffix and LCP array: Simple, direct, small, and fast. In LATIN: Latin American Symposium on Theoretical Informatics (pp. 162–177). doi:10.1007/978-3-031-55598-5_11

View at Publisher

Additional Files
View Online

Sparse suffix and LCP array: Simple, direct, small, and fast

Publication

Publication

Address

CWI researchers

Questions or comments?

Sparse suffix and LCP array: Simple, direct, small, and fast

Publication

Publication

Workflow

Workflow

Add Content