Hide and mine in strings: Hardness and algorithms

Bernardini, Giulia; Conte, Alessio; Gourdel, Garance; Grossi, Roberto; Loukides, Grigorios; Pisanti, Nadia; Pissis, Solon; Punzi, Giulia; Stougie, Leen; Sweering, Michelle

doi:10.1109/ICDM50108.2020.00103

G. Bernardini (Giulia), A. Conte (Alessio), Gourdel (Garance), R. Grossi (Roberto), G. Loukides (Grigorios), N. Pisanti (Nadia), S. Pissis (Solon), G. Punzi (Giulia), L. Stougie (Leen) and M.J.M. Sweering (Michelle)

2020-11-17

Hide and mine in strings: Hardness and algorithms

Presented at the 20th IEEE International Conference on Data Mining, ICDM 2020 (November 2022), Sorrento, Italy

We initiate a study on the fundamental relation between data sanitization (i.e., the process of hiding confidential information in a given dataset) and frequent pattern mining, in the context of sequential (string) data. Current methods for string sanitization hide confidential patterns introducing, however, a number of spurious patterns that may harm the utility of frequent pattern mining. The main computational problem is to minimize this harm. Our contribution here is twofold. First, we present several hardness results, for different variants of this problem, essentially showing that these variants cannot be solved or even be approximated in polynomial time. Second, we propose integer linear programming formulations for these variants and algorithms to solve them, which work in polynomial time under certain realistic assumptions on the problem parameters.

Additional Metadata
Keywords	Data privacy, Data sanitization, Frequent pattern mining, Knowledge hiding, String algorithms
Persistent URL	doi.org/10.1109/ICDM50108.2020.00103
Conference	20th IEEE International Conference on Data Mining, ICDM 2020
Project	Networks
Organisation	Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Bernardini, G., Conte, A., Gourdel, G., Grossi, R., Loukides, G., Pisanti, N., … Sweering, M. (2020). Hide and mine in strings: Hardness and algorithms. In 20th IEEE International Conference on Data Mining (pp. 924–929). doi:10.1109/ICDM50108.2020.00103

View at Publisher

Full Text ( Author Manuscript , 342kb )

Hide and mine in strings: Hardness and algorithms

Publication

Publication

Address

CWI researchers

Questions or comments?

Hide and mine in strings: Hardness and algorithms

Publication

Publication

Workflow

Workflow

Add Content