A Customized Text Privatiztion Mechanism with Differential Privacy

Jamie Cui,Fengran Mo,Cen Chen,Jian-Yun Nie,Huimin Chen
Abstract:In Natural Language Understanding (NLU) applications, training an effective model often requires a massive amount of data. However, text data in the real world are scattered in different institutions or user devices. Directly sharing them with the NLU service provider brings huge privacy risks, as text data often contains sensitive information, leading to potential privacy leakage. A typical way to protect privacy is to directly privatize raw text and leverage Differential Privacy (DP) to quantify the privacy protection level. However, existing text privatization mechanisms that privatize text by applying 𝑑 πœ’ -privacy are not applicable for all similarity metrics and fail to achieve a good privacy-utility trade-off. This is primarily because (1) 𝑑 πœ’ -privacy’s strict requirements for similarity metrics; (2) these methods privatize each token in the text equally by providing the same and excessively large output set. Bad utility-privacy trade-off performance impedes the adoption of current text privatization mechanisms in real-world applications. In this paper, we propose a Customized differentially private Text privatization mechanism (CusText) that assigns each input token a customized output set to provide more advanced adaptive privacy protection at the token-level. It also overcomes the limitation for the similarity metrics caused by 𝑑 πœ’ -privacy notion, by turning the mechanism to satisfy πœ– -DP. Furthermore, we provide two new text privatization strategies to boost the utility of privatized text without compromising privacy and design a new attack strategy for further evaluating the protection level of our mechanism empirically from a new attack’s view. We also conduct extensive experiments on two widely used datasets to demonstrate that our
Computer Science
What problem does this paper attempt to address?