Optimal string sanitization against strategic attackers

Bian, Pengxin; Theodorakopoulos, George; Pissis, Solon; Loukides, Grigorios

doi:10.1109/TIFS.2025.3610245

Strings (sequences of elements) are often disseminated to support applications, e.g., in bioinformatics, web analysis, and transportation. Unfortunately, this may expose sensitive patterns that model confidential knowledge. Concealing the occurrences of sensitive patterns in a string (e.g., by deleting some elements) while minimizing the associated quality loss has been the objective of several string sanitization methods. However, real-world attackers are likely to possess background knowledge about the string, e.g., an individual’s genome sequence is almost identical to a reference genome sequence. In addition, it is good security practice to assume that the attacker will know the algorithm that has been used to sanitize the string (Kerckhoffs’ principle). Yet, all existing methods fail to protect strings against such attackers, risking privacy breaches in critical applications. In our work, we consider for the first time how to defend against strategic attackers who possess such knowledge. To achieve this, we propose a novel framework to sanitize a string by probabilistically replacing carefully selected patterns. As part of this framework, we design three mathematical programming algorithms which compute the optimal replacement probabilities under different objectives and constraints, offering different privacy gain / quality loss tradeoffs. Our algorithms protect against strategic attackers using new concepts and measures, protect sensitive patterns of any length, and can construct one or more optimally sanitized strings that can be used in applications such as frequent pattern mining. Our experiments using five real-world datasets from different domains show that all our algorithms are substantially more effective than a natural baseline (e.g., they offer up to 2 times more privacy when they are configured to incur the same quality loss, and up to 3 times lower quality loss when they are configured to offer the same privacy). They also show that two “hybrid” algorithms that we propose, based on combining elements of the above algorithms, inherit the advantages of their constituent algorithms. These results, coupled with the generality of our approach, make our algorithms practical and beneficial for deployment.

Additional Metadata
Keywords	Data sanitization, Strings, Sensitive patterns
Persistent URL	doi.org/10.1109/TIFS.2025.3610245
Journal	IEEE Transactions on Information Forensics and Security
Organisation	Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Bian, P., Theodorakopoulos, G., Pissis, S., & Loukides, G. (2025). Optimal string sanitization against strategic attackers. IEEE Transactions on Information Forensics and Security. doi:10.1109/TIFS.2025.3610245

Optimal string sanitization against strategic attackers

Publication

Publication

Address

CWI researchers

Questions or comments?

Optimal string sanitization against strategic attackers

Publication

Publication

Workflow

Workflow

Add Content