CoLLAB: A Collaborative Approach for Multilingual Abuse Detection

Orchid Chetia Phukan,Yashasvi Chaurasia,Arun Balaji Buduru,Rajesh Sharma
2024-06-05
Abstract:In this study, we investigate representations from paralingual Pre-Trained model (PTM) for Audio Abuse Detection (AAD), which has not been explored for AAD. Our results demonstrate their superiority compared to other PTM representations on the ADIMA benchmark. Furthermore, combining PTM representations enhances AAD performance. Despite these improvements, challenges with cross-lingual generalizability still remain, and certain languages require training in the same language. This demands individual models for different languages, leading to scalability, maintenance, and resource allocation issues and hindering the practical deployment of AAD systems in linguistically diverse real-world environments. To address this, we introduce CoLLAB, a novel framework that doesn't require training and allows seamless merging of models trained in different languages through weight-averaging. This results in a unified model with competitive AAD performance across multiple languages.
Audio and Speech Processing
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the challenges of Audio Abuse Detection (AAD) in a multilingual environment. Specifically, the paper focuses on the following aspects: 1. **Insufficient cross - language generalization ability**: Existing AAD models perform poorly when dealing with audio - abuse content in different languages, especially when the model is trained on one language and tested on another. This leads to the need to train models separately for each language, thereby increasing the complexity, maintenance cost and resource requirements of the system. 2. **Model merging and unification**: To address the above challenges, the paper proposes a new framework - CoLLAB, which can seamlessly merge models trained on different languages and generate a unified model through the method of weight averaging, thereby achieving efficient cross - language audio - abuse detection. 3. **Exploring the effectiveness of different pre - trained models (PTM)**: The paper also explores the performance of different types of pre - trained models (such as TRILLsson, Whisper, MMS, WavLM and x - vector) in the AAD task, and verifies that combining multiple PTM representations can further improve the performance of AAD. Through these studies, the paper aims to improve the practicality and efficiency of AAD systems in a multilingual environment, reduce the need for language - specific models, and thus promote the wide deployment of AAD technology in practical applications.