MEDs for PETs: Multilingual Euphemism Disambiguation for Potentially Euphemistic Terms

Patrick Lee,Alain Chirino Trujillo,Diana Cuevas Plancarte,Olumide Ebenezer Ojo,Xinyi Liu,Iyanuoluwa Shode,Yuan Zhao,Jing Peng,Anna Feldman
2024-01-26
Abstract:This study investigates the computational processing of euphemisms, a universal linguistic phenomenon, across multiple languages. We train a multilingual transformer model (XLM-RoBERTa) to disambiguate potentially euphemistic terms (PETs) in multilingual and cross-lingual settings. In line with current trends, we demonstrate that zero-shot learning across languages takes place. We also show cases where multilingual models perform better on the task compared to monolingual models by a statistically significant margin, indicating that multilingual data presents additional opportunities for models to learn about cross-lingual, computational properties of euphemisms. In a follow-up analysis, we focus on universal euphemistic "categories" such as death and bodily functions among others. We test to see whether cross-lingual data of the same domain is more important than within-language data of other domains to further understand the nature of the cross-lingual transfer.
Computation and Language
What problem does this paper attempt to address?
The paper aims to explore the issue of euphemism processing in a multilingual environment. Specifically, the researchers use a multilingual transformer model (XLM-RoBERTa) to disambiguate Potentially Euphemistic Terms (PETs) in various languages. The main objectives of the study include: 1. **Validation of Zero-shot Learning**: By performing zero-shot learning across different languages, the study aims to verify whether the model can handle euphemisms across languages. 2. **Comparison of Multilingual and Monolingual Models**: The study investigates the performance differences between multilingual and monolingual models in handling euphemism tasks and conducts statistical significance analysis. 3. **Exploration of Cross-linguistic Features**: The study further analyzes the commonalities between different languages, particularly focusing on categories of euphemisms in specific domains (such as death, bodily functions, etc.), to understand the importance of cross-linguistic data. Through these experiments, the researchers hope to reveal how multilingual models leverage cross-linguistic similarities to enhance euphemism processing capabilities and to explore the mutual influences between different languages and the underlying mechanisms.