Strings (sequences of elements) are often disseminated to support applications, e.g., in bioinformatics, web analysis, and transportation. Unfortunately, this may expose sensitive patterns that model confidential knowledge. Concealing the occurrences of sensitive patterns in a string (e.g., by deleting some elements) while minimizing the associated quality loss has been the objective of several string sanitization methods. However, real-world attackers are likely to possess background knowledge about the string, e.g., an individual’s genome sequence is almost identical to a reference genome sequence. In addition, it is good security practice to assume that the attacker will know the algorithm that has been used to sanitize the string (Kerckhoffs’ principle). Yet, all existing methods fail to protect strings against such attackers, risking privacy breaches in critical applications. In our work, we consider for the first time how to defend against strategic attackers who possess such knowledge. To achieve this, we propose a novel framework to sanitize a string by probabilistically replacing carefully selected patterns. As part of this framework, we design three mathematical programming algorithms which compute the optimal replacement probabilities under different objectives and constraints, offering different privacy gain / quality loss tradeoffs. Our algorithms protect against strategic attackers using new concepts and measures, protect sensitive patterns of any length, and can construct one or more optimally sanitized strings that can be used in applications such as frequent pattern mining. Our experiments using five real-world datasets from different domains show that all our algorithms are substantially more effective than a natural baseline (e.g., they offer up to 2 times more privacy when they are configured to incur the same quality loss, and up to 3 times lower quality loss when they are configured to offer the same privacy). They also show that two “hybrid” algorithms that we propose, based on combining elements of the above algorithms, inherit the advantages of their constituent algorithms. These results, coupled with the generality of our approach, make our algorithms practical and beneficial for deployment.

, ,
doi.org/10.1109/TIFS.2025.3610245
IEEE Transactions on Information Forensics and Security
Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands

Bian, P., Theodorakopoulos, G., Pissis, S., & Loukides, G. (2025). Optimal string sanitization against strategic attackers. IEEE Transactions on Information Forensics and Security. doi:10.1109/TIFS.2025.3610245