Towards Quantifying The Privacy Of Redacted Text

Vaibhav Gusain,Douglas Leith
DOI: https://doi.org/10.1007/978-3-031-28238-6_32
2024-10-10
Abstract:In this paper we propose use of a k-anonymity-like approach for evaluating the privacy of redacted text. Given a piece of redacted text we use a state of the art transformer-based deep learning network to reconstruct the original text. This generates multiple full texts that are consistent with the redacted text, i.e. which are grammatical, have the same non-redacted words etc, and represents each of these using an embedding vector that captures sentence similarity. In this way we can estimate the number, diversity and quality of full text consistent with the redacted text and so evaluate privacy.
Machine Learning
What problem does this paper attempt to address?