Abstract:Authorship obfuscation aims to disguise the identity of an author within a text by altering the writing style, vocabulary, syntax, and other linguistic features associated with the text author. This alteration needs to balance privacy and utility. While strong obfuscation techniques can effectively hide the author's identity, they often degrade the quality and usefulness of the text for its intended purpose. Conversely, maintaining high utility tends to provide insufficient privacy, making it easier for an adversary to de-anonymize the author. Thus, achieving an optimal trade-off between these two conflicting objectives is crucial. In this paper, we propose TAROT: Task-Oriented Authorship Obfuscation Using Policy Optimization, a new unsupervised authorship obfuscation method whose goal is to optimize the privacy-utility trade-off by regenerating the entire text considering its downstream utility. Our approach leverages policy optimization as a fine-tuning paradigm over small language models in order to rewrite texts by preserving author identity and downstream task utility. We show that our approach largely reduce the accuracy of attackers while preserving utility. We make our code and models publicly available.

What problem does this paper attempt to address?

The paper primarily focuses on addressing the issue of author identity privacy protection in texts, specifically how to modify texts to hide the original author's identity without significantly affecting the text's utility. Specifically, the research aims to achieve the following goals: 1. **Balancing Privacy and Usability**: The goal of Authorship Obfuscation (AO) is to modify features such as text style, vocabulary, and syntax while maintaining the text's usability and purpose. Excessive modification can damage the text's quality and utility, while insufficient modification cannot effectively protect the author's privacy. 2. **Task-Oriented Authorship Obfuscation**: A new method called TAROT (Task-Oriented Authorship Obfuscation Using Policy Optimization) is proposed. This method uses policy optimization algorithms to adjust the text to maximize privacy protection while retaining the utility of downstream tasks. 3. **Addressing Limitations of Existing Methods**: Existing authorship obfuscation methods often focus on minimizing changes to the text content to maintain semantic coherence and meaning, but this approach is often insufficient to counter real-world attack scenarios. ### Main Contributions 1. **Framework Design**: A new framework is designed for task-oriented authorship obfuscation, guiding generative models through policy optimization algorithms to maximize privacy protection while retaining the utility of downstream tasks. 2. **Model Proposal**: The TAROT model is proposed, an unsupervised method that can obfuscate text without prior knowledge of the author's information and can maintain text utility across various tasks. The model has two versions, TAROT-PPO and TAROT-DPO, based on different policy optimization algorithms. 3. **Evaluation Experiments**: Evaluation experiments were conducted on three different datasets, including movie reviews, blog posts, and academic documents, demonstrating that TAROT can perform authorship obfuscation for different tasks across different datasets while protecting the author's identity. Through the above work, the paper aims to provide a method that can effectively protect the author's privacy while maintaining the text's utility, achieving a good balance between privacy protection and text usability.

TAROT: Task-Oriented Authorship Obfuscation Using Policy Optimization Methods

Keep It Private: Unsupervised Privatization of Online Text

ALISON: Fast and Effective Stylometric Authorship Obfuscation

Avengers Ensemble! Improving Transferability of Authorship Obfuscation

A Girl Has A Name: Detecting Authorship Obfuscation

UID as a Guiding Metric for Automated Authorship Obfuscation

TextObfuscator: Making Pre-trained Language Model a Privacy Protector via Obfuscating Word Representations

Author Obfuscation Using Generalised Differential Privacy

Protecting Anonymous Speech: A Generative Adversarial Network Methodology for Removing Stylistic Indicators in Text

JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding over Small Language Models

UPTON: Preventing Authorship Leakage from Public Text Release via Data Poisoning

StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements

Authorship Style Transfer with Policy Optimization

Stochastic Optimization of Program Obfuscation.

IDT: Dual-Task Adversarial Attacks for Privacy Protection

Authorship Obfuscation in Multilingual Machine-Generated Text Detection

InfoScrub: Towards Attribute Privacy by Targeted Obfuscation

ObfuscaTune: Obfuscated Offsite Fine-tuning and Inference of Proprietary LLMs on Private Datasets

SynTF: Synthetic and Differentially Private Term Frequency Vectors for Privacy-Preserving Text Mining

Obfuscating Provenance-Based Forensic Investigations with Mapping System Meta-Behavior

Style Pooling: Automatic Text Style Obfuscation for Improved Classification Fairness