Abstract:Online abusive content detection, particularly in low-resource settings and within the audio modality, remains underexplored. We investigate the potential of pre-trained audio representations for detecting abusive language in low-resource languages, in this case, in Indian languages using Few Shot Learning (FSL). Leveraging powerful representations from models such as Wav2Vec and Whisper, we explore cross-lingual abuse detection using the ADIMA dataset with FSL. Our approach integrates these representations within the Model-Agnostic Meta-Learning (MAML) framework to classify abusive language in 10 languages. We experiment with various shot sizes (50-200) evaluating the impact of limited data on performance. Additionally, a feature visualization study was conducted to better understand model behaviour. This study highlights the generalization ability of pre-trained models in low-resource scenarios and offers valuable insights into detecting abusive language in multilingual contexts.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is cross - language audio abuse detection in low - resource environments. Specifically, the researchers focus on using few - shot learning (FSL) in Indian languages to detect abusive language in audio. Since the amount of data for these languages is usually small, traditional machine - learning methods are difficult to work effectively in this situation. Therefore, the researchers use pre - trained audio representations (such as Wav2Vec and Whisper) combined with the model - agnostic meta - learning (MAML) framework to explore how to improve the performance of cross - language audio abuse detection with limited data. ### Main problems 1. **Audio abuse detection in low - resource languages**: How to effectively detect audio abuse in multiple Indian languages with a limited amount of data. 2. **Cross - language detection**: How to use the cross - language capabilities of pre - trained models to improve the abuse detection effect between different languages. 3. **Few - shot learning**: How to quickly adapt to new tasks through meta - learning techniques when there is only a small amount of labeled data. ### Research background - **Regulatory requirements of social media platforms**: With the popularization of audio - social platforms (such as Twitter Spaces, Clubhouse, etc.), the regulation of audio content has become particularly important, especially in multilingual countries such as India. - **Limitations of existing methods**: Traditional text transcription methods (ASR + NLP) have limitations in detecting audio abuse because some abusive words may not be pronounced clearly, resulting in omissions. - **Advantages of pre - trained audio models**: Pre - trained audio models (such as Wav2Vec and Whisper) are trained on large - scale data and can extract robust audio features, which are suitable for multiple tasks. ### Research methods - **Pre - trained audio feature extraction**: Use Wav2Vec and Whisper to extract audio features and perform feature normalization processing (L2 normalization and time - mean normalization). - **Model - agnostic meta - learning (MAML)**: Use the MAML framework to quickly adapt to new audio abuse detection tasks with a small number of samples. - **Cross - language training and testing**: Conduct training and testing on multiple Indian languages to evaluate the cross - language generalization ability of the model. ### Experimental results - **Accuracy**: Experiments were carried out on 10 Indian languages. The results show that the features of Whisper combined with L2 normalization perform best in the 50 - shot and 100 - shot settings, with the highest accuracy rate reaching 85.22%. - **Feature visualization**: Through t - SNE visualization technology, the feature clustering of different languages was observed. It was found that the languages in the Dravidian language family (such as Malayalam and Tamil) form tighter clusters, while the languages in the Indo - Aryan language family (such as Hindi and Punjabi) form more overlapping clusters. ### Conclusion This research shows that in a low - resource environment, through pre - trained audio features and meta - learning techniques, the performance of cross - language audio abuse detection can be effectively improved. This method is not only innovative in technology, but also provides strong support for the regulation of audio content in practical applications.

Towards Cross-Lingual Audio Abuse Detection in Low-Resource Settings with Few-Shot Learning

ADIMA: Abuse Detection In Multilingual Audio

Multilingual and Multimodal Abuse Detection

Vicinal Risk Minimization for Few-Shot Cross-lingual Transfer in Abusive Language Detection

CoLLAB: A Collaborative Approach for Multilingual Abuse Detection

How to Solve Few-Shot Abusive Content Detection Using the Data We Actually Have

Abusive Speech Detection in Indic Languages Using Acoustic Features

Data Bootstrapping Approaches to Improve Low Resource Abusive Language Detection for Indic Languages

Transferring Audio Deepfake Detection Capability Across Languages

Model-Agnostic Meta-Learning for Multilingual Hate Speech Detection

User-Aware Multilingual Abusive Content Detection in Social Media

Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection

Transfer Language Selection for Zero-Shot Cross-Lingual Abusive Language Detection

Detect All Abuse! Toward Universal Abusive Language Detection Models

Leveraging Weakly Annotated Data for Hate Speech Detection in Code-Mixed Hinglish: A Feasibility-Driven Transfer Learning Approach with Large Language Models

Fine-Tuning Llama 2 Large Language Models for Detecting Online Sexual Predatory Chats and Abusive Texts

A Federated Approach to Few-Shot Hate Speech Detection for Marginalized Communities

MLAAD: The Multi-Language Audio Anti-Spoofing Dataset

Model Adaptation for ASR in low-resource Indian Languages

Abusive Language Detection in Online User Content

Heterogeneity over Homogeneity: Investigating Multilingual Speech Pre-Trained Models for Detecting Audio Deepfake