pysentimiento: A Python Toolkit for Opinion Mining and Social NLP tasks

Juan Manuel Pérez,Mariela Rajngewerc,Juan Carlos Giudici,Damián A. Furman,Franco Luque,Laura Alonso Alemany,María Vanina Martínez
2024-07-14
Abstract:In recent years, the extraction of opinions and information from user-generated text has attracted a lot of interest, largely due to the unprecedented volume of content in Social Media. However, social researchers face some issues in adopting cutting-edge tools for these tasks, as they are usually behind commercial APIs, unavailable for other languages than English, or very complex to use for non-experts. To address these issues, we present pysentimiento, a comprehensive multilingual Python toolkit designed for opinion mining and other Social NLP tasks. This open-source library brings state-of-the-art models for Spanish, English, Italian, and Portuguese in an easy-to-use Python library, allowing researchers to leverage these techniques. We present a comprehensive assessment of performance for several pre-trained language models across a variety of tasks, languages, and datasets, including an evaluation of fairness in the results.
Computation and Language
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper primarily aims to address the following issues: 1. **Limitations of opinion mining tools**: Existing opinion mining tools have several problems, including: - Many advanced tools are part of commercial APIs and require payment for use. - Insufficient or no support for non-English languages. - Tools are too complex for non-expert users. 2. **Providing a multilingual toolkit**: To address the above issues, the authors developed a Python toolkit named `pysentimiento`, which is designed for opinion mining and other social natural language processing tasks on social media. This toolkit supports multiple languages (Spanish, English, Italian, and Portuguese) and provides an easy-to-use API, allowing researchers to easily leverage state-of-the-art models. 3. **Performance evaluation and fairness analysis**: The authors also conducted a comprehensive evaluation of several pre-trained models on different tasks (such as sentiment analysis, emotion detection, hate speech detection, and sarcasm detection) and performed fairness evaluations to ensure that the performance differences between different groups are not too significant. Through these efforts, the paper hopes to advance opinion mining research and provide a more open, user-friendly, and efficient toolkit to support research in related fields.