Frequency-constrained substring complexity

Pissis, Solon; Shekelyan, Michael; Liu, Chang; Loukides, Grigorios

doi:10.1007/978-3-031-43980-3_28

S. Pissis (Solon), M. Shekelyan (Michael), C. Liu (Chang) and G. Loukides (Grigorios)

2023-09-20

Frequency-constrained substring complexity

Presented at the 30th International Symposium on String Processing and Information Retrieval, SPIRE 2023 (September 2023), Pisa, Italy

We introduce the notion of frequency-constrained substring complexity. For any finite string, it counts the distinct substrings of the string per length and frequency class. For a string x of length n and a partition of [n] in τ intervals, I = I1, . . . , Iτ , the frequency-constrained substring complexity of x is the function fx,I (i, j) that maps i, j to the number of distinct substrings of length i of x occurring at least αj and at most βj times in x, where Ij = [αj , βj ]. We extend this notion as follows. For a string x, a dictionary D of d strings (documents), and a partition of [d] in τ intervals I1, . . . , Iτ , we define a 2D array S = S[1 . . |x|, 1 . . τ ] as follows: S[i, j] is the number of distinct substrings of length i of x occurring in at least αj and at most βj documents, where Ij = [αj , βj ]. Array S can thus be seen as the distribution of the substring complexity of x into τ document frequency classes. We show that after a linear-time preprocessing of D, for any x and any partition of [d] in τ intervals given online, array S can be computed in near-optimal O(|x|τ log log d) time.

Additional Metadata
Keywords	Predecessor search, Substring complexity, Suffix tree
Persistent URL	doi.org/10.1007/978-3-031-43980-3_28
Series	Lecture Notes in Computer Science
Conference	30th International Symposium on String Processing and Information Retrieval, SPIRE 2023
Organisation	Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Pissis, S., Shekelyan, M., Liu, C., & Loukides, G. (2023). Frequency-constrained substring complexity. In Proceedings of the International Symposium on String Processing and Information Retrieval (pp. 345–352). doi:10.1007/978-3-031-43980-3_28

View at Publisher

Free Full Text ( Author Manuscript , 381kb )

Frequency-constrained substring complexity

Publication

Publication

Address

CWI researchers

Questions or comments?

Frequency-constrained substring complexity

Publication

Publication

Workflow

Workflow

Add Content