2019-03-26

# Constructing antidictionaries in output-sensitive space

## Publication

### Publication

*Presented at the 2019 Data Compression Conference, DCC 2019 (March 2019), Snowbird, Utah, USA*

A word x that is absent from a word y is called minimal if all its proper factors occur in y. Given a collection of k words y

_{1}, y_{2},...,y_{k}over an alphabet Σ, we are asked to compute the set M(y^{1#...#}y^{k})^{ℓ}of minimal absent words of length at most ℓ of word y=y_{1}#y_{2}#...#y_{k}, #∉Σ. In data compression, this corresponds to computing the antidictionary of k documents. In bioinformatics, it corresponds to computing words that are absent from a genome of k chromosomes. This computation generally requires Ω(n) space for n=|y| using any of the plenty available O(n)-time algorithms. This is because an Ω(n)-sized text index is constructed over y which can be impractical for large n. We do the identical computation incrementally using output-sensitive space. This goal is reasonable when ||M(y^{1#...#}y^{N})^{ℓ}|| =o(n), for all N ϵ[1, k]. For instance, in the human genome, n ≈ 3 × 10^{9}but ||M (y^{1#...#yk})^{12}|| ≈ 10^{6}. We consider a constant-sized alphabet for stating our results. We show that all M(y_{1})^{ℓ},...,M(y_{1#...#}y_{k})^{ℓ}can be computed in O(kn+Σ_{N=1}^{k}||M(y_{1}#...#(y_{N})^{ℓ}||) total time using O(MaxIn+MaxOut) space, where MaxIn is the length of the longest word in y_{1},...,y_{k}and MaxOut=max{||M (y_{1)#...#}(y_{N})^{ℓ}||:N ϵ[1, k]. Proof-of-concept experimental results are also provided confirming our theoretical findings and justifying our contribution.Additional Metadata | |
---|---|

Keywords | Absent words, Antidictionaries, Data compression, Output sensitive algorithms, String algorithms |

Persistent URL | dx.doi.org/10.1109/DCC.2019.00062 |

Conference | 2019 Data Compression Conference, DCC 2019 |

Citation |
Ayad, Lorraine, Badkobeh, G, Fici, G, Heliou, A, & Pissis, S. (2019). Constructing antidictionaries in output-sensitive space. In
Data Compression Conference Proceedings (pp. 538–547). doi:10.1109/DCC.2019.00062 |