Towards Cross-Lingual Audio Abuse Detection in Low-Resource Settings with Few-Shot Learning

Aditya Narayan Sankaran,Reza Farahbakhsh,Noel Crespi
2024-12-03
Abstract:Online abusive content detection, particularly in low-resource settings and within the audio modality, remains underexplored. We investigate the potential of pre-trained audio representations for detecting abusive language in low-resource languages, in this case, in Indian languages using Few Shot Learning (FSL). Leveraging powerful representations from models such as Wav2Vec and Whisper, we explore cross-lingual abuse detection using the ADIMA dataset with FSL. Our approach integrates these representations within the Model-Agnostic Meta-Learning (MAML) framework to classify abusive language in 10 languages. We experiment with various shot sizes (50-200) evaluating the impact of limited data on performance. Additionally, a feature visualization study was conducted to better understand model behaviour. This study highlights the generalization ability of pre-trained models in low-resource scenarios and offers valuable insights into detecting abusive language in multilingual contexts.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is cross - language audio abuse detection in low - resource environments. Specifically, the researchers focus on using few - shot learning (FSL) in Indian languages to detect abusive language in audio. Since the amount of data for these languages is usually small, traditional machine - learning methods are difficult to work effectively in this situation. Therefore, the researchers use pre - trained audio representations (such as Wav2Vec and Whisper) combined with the model - agnostic meta - learning (MAML) framework to explore how to improve the performance of cross - language audio abuse detection with limited data. ### Main problems 1. **Audio abuse detection in low - resource languages**: How to effectively detect audio abuse in multiple Indian languages with a limited amount of data. 2. **Cross - language detection**: How to use the cross - language capabilities of pre - trained models to improve the abuse detection effect between different languages. 3. **Few - shot learning**: How to quickly adapt to new tasks through meta - learning techniques when there is only a small amount of labeled data. ### Research background - **Regulatory requirements of social media platforms**: With the popularization of audio - social platforms (such as Twitter Spaces, Clubhouse, etc.), the regulation of audio content has become particularly important, especially in multilingual countries such as India. - **Limitations of existing methods**: Traditional text transcription methods (ASR + NLP) have limitations in detecting audio abuse because some abusive words may not be pronounced clearly, resulting in omissions. - **Advantages of pre - trained audio models**: Pre - trained audio models (such as Wav2Vec and Whisper) are trained on large - scale data and can extract robust audio features, which are suitable for multiple tasks. ### Research methods - **Pre - trained audio feature extraction**: Use Wav2Vec and Whisper to extract audio features and perform feature normalization processing (L2 normalization and time - mean normalization). - **Model - agnostic meta - learning (MAML)**: Use the MAML framework to quickly adapt to new audio abuse detection tasks with a small number of samples. - **Cross - language training and testing**: Conduct training and testing on multiple Indian languages to evaluate the cross - language generalization ability of the model. ### Experimental results - **Accuracy**: Experiments were carried out on 10 Indian languages. The results show that the features of Whisper combined with L2 normalization perform best in the 50 - shot and 100 - shot settings, with the highest accuracy rate reaching 85.22%. - **Feature visualization**: Through t - SNE visualization technology, the feature clustering of different languages was observed. It was found that the languages in the Dravidian language family (such as Malayalam and Tamil) form tighter clusters, while the languages in the Indo - Aryan language family (such as Hindi and Punjabi) form more overlapping clusters. ### Conclusion This research shows that in a low - resource environment, through pre - trained audio features and meta - learning techniques, the performance of cross - language audio abuse detection can be effectively improved. This method is not only innovative in technology, but also provides strong support for the regulation of audio content in practical applications.