CoLLAB: A Collaborative Approach for Multilingual Abuse Detection

Orchid Chetia Phukan,Yashasvi Chaurasia,Arun Balaji Buduru,Rajesh Sharma

2024-06-05

Abstract:In this study, we investigate representations from paralingual Pre-Trained model (PTM) for Audio Abuse Detection (AAD), which has not been explored for AAD. Our results demonstrate their superiority compared to other PTM representations on the ADIMA benchmark. Furthermore, combining PTM representations enhances AAD performance. Despite these improvements, challenges with cross-lingual generalizability still remain, and certain languages require training in the same language. This demands individual models for different languages, leading to scalability, maintenance, and resource allocation issues and hindering the practical deployment of AAD systems in linguistically diverse real-world environments. To address this, we introduce CoLLAB, a novel framework that doesn't require training and allows seamless merging of models trained in different languages through weight-averaging. This results in a unified model with competitive AAD performance across multiple languages.

Audio and Speech Processing

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the challenges of Audio Abuse Detection (AAD) in a multilingual environment. Specifically, the paper focuses on the following aspects: 1. **Insufficient cross - language generalization ability**: Existing AAD models perform poorly when dealing with audio - abuse content in different languages, especially when the model is trained on one language and tested on another. This leads to the need to train models separately for each language, thereby increasing the complexity, maintenance cost and resource requirements of the system. 2. **Model merging and unification**: To address the above challenges, the paper proposes a new framework - CoLLAB, which can seamlessly merge models trained on different languages and generate a unified model through the method of weight averaging, thereby achieving efficient cross - language audio - abuse detection. 3. **Exploring the effectiveness of different pre - trained models (PTM)**: The paper also explores the performance of different types of pre - trained models (such as TRILLsson, Whisper, MMS, WavLM and x - vector) in the AAD task, and verifies that combining multiple PTM representations can further improve the performance of AAD. Through these studies, the paper aims to improve the practicality and efficiency of AAD systems in a multilingual environment, reduce the need for language - specific models, and thus promote the wide deployment of AAD technology in practical applications.

CoLLAB: A Collaborative Approach for Multilingual Abuse Detection

Multilingual and Multimodal Abuse Detection

Towards Cross-Lingual Audio Abuse Detection in Low-Resource Settings with Few-Shot Learning

Heterogeneity over Homogeneity: Investigating Multilingual Speech Pre-Trained Models for Detecting Audio Deepfake

ADIMA: Abuse Detection In Multilingual Audio

Detect All Abuse! Toward Universal Abusive Language Detection Models

Together We Can: Multilingual Automatic Post-Editing for Low-Resource Languages

Language-agnostic Multilingual Modeling

CoLLiE: Collaborative Training of Large Language Models in an Efficient Way

CTAL: Pre-training Cross-modal Transformer for Audio-and-Language Representations

Data Bootstrapping Approaches to Improve Low Resource Abusive Language Detection for Indic Languages

Learning to Decode Collaboratively with Multiple Language Models

Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models

MultiAgent Collaboration Attack: Investigating Adversarial Attacks in Large Language Model Collaborations via Debate

Adversarial synthesis based data-augmentation for code-switched spoken language identification

Aggressive Language Detection with Joint Text Normalization via Adversarial Multi-task Learning

Advancing Audio Emotion and Intent Recognition with Large Pre-Trained Models and Bayesian Inference

ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks

Elevating Code-mixed Text Handling through Auditory Information of Words

PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification