Parameterized text indexing with one wildcard

Ganguly, Arnab; Hon, Wing-Kai; Huang, Yu-An; Pissis, Solon; Shah, Rahul; Thankachan, Sharma

doi:10.1109/DCC.2019.00023

A. Ganguly (Arnab), W.-K. Hon (Wing-Kai), Y.-A. Huang (Yu-An), S. Pissis (Solon), R. Shah (Rahul) and S.V. Thankachan (Sharma)

2019-03-26

Parameterized text indexing with one wildcard

Presented at the 2019 Data Compression Conference, DCC 2019 (March 2019), Snowbird, Utah, USA

Two equal-length strings X and Y over an alphabet Σ of size σ are a parameterized match iff X can be transformed to Y by renaming the character X[i] to the character Y[i] for 1 ≤ i ≤ |X| using a one-to-one function from the set of characters in X to the set of characters in Y. The parameterized text indexing problem is defined as: Index a text T of n characters over an alphabet set Σ of size σ, such that whenever a pattern P[1, p] comes as a query, we can report all occ parameterized occurrences of P in T. A position i [1, n] is a parameterized occurrence of P in T, iff P and T[i,(i+p-1)] are a parameterized match. We study an interesting generalization of this problem, where the pattern contains one wildcard character φ ∉ Σ that matches with any other character in Σ. Therefore, for a pattern P[1, p] = P₁φP₂, our task is to report all positions i in T, such that the string P_1 P_2 and the string obtained by concatenating T[i,(i+|P₁|-1)] and T[(i+|P₁|+1),(i+p-1)] are a parameterized match. We show that such queries can be answered in optimal O(p+occ) time per query using an O(n log n) space index. We then show how to compress our index into O(n log σ) space but with a higher query cost of O(p(log log n+logσ)+occ logσ).

Additional Metadata
Keywords	Heavy Path, Parameterized Text Indexing, Succinct Data Structures, Suffix Tree, Wildcard
Persistent URL	doi.org/10.1109/DCC.2019.00023
Conference	2019 Data Compression Conference, DCC 2019
Organisation	Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Ganguly, A., Hon, W.-K., Huang, Y.-A., Pissis, S., Shah, R., & Thankachan, S. (2019). Parameterized text indexing with one wildcard. In Data Compression Conference Proceedings (pp. 152–161). doi:10.1109/DCC.2019.00023

View at Publisher

Parameterized text indexing with one wildcard

Publication

Publication

Address

CWI researchers

Questions or comments?

Parameterized text indexing with one wildcard

Publication

Publication

Workflow

Workflow

Add Content