STraDa: A Singer Traits Dataset

Yuexuan Kong,Viet-Anh Tran,Romain Hennequin

2024-06-06

Abstract:There is a limited amount of large-scale public datasets that contain downloadable music audio files and rich lead singer metadata. To provide such a dataset to benefit research in singing voices, we created Singer Traits Dataset (STraDa) with two subsets: automatic-strada and annotated-strada. The automatic-strada contains twenty-five thousand tracks across numerous genres and languages of more than five thousand unique lead singers, which includes cross-validated lead singer metadata as well as other track metadata. The annotated-strada consists of two hundred tracks that are balanced in terms of 2 genders, 5 languages, and 4 age groups. To show its use for model training and bias analysis thanks to its metadata's richness and downloadable audio files, we benchmarked singer sex classification (SSC) and conducted bias analysis.

Sound,Audio and Speech Processing

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the current lack of large - scale, public datasets that contain downloadable music audio files and rich lead - singer metadata, especially in the field of singing voice research. Specifically: 1. **Insufficient scale and diversity of datasets**: Existing public datasets are usually small in scale and cover limited music genres and languages, unable to meet the requirements of large - scale machine - learning model training. 2. **Unclear annotation of lead - singer information**: In many existing datasets, the information about lead - singers is not clearly annotated. Especially when there are multiple lead - singers in a song, it is easy to lead to annotation errors. 3. **Lack of balanced subsets for evaluation and bias analysis**: In order to better evaluate model performance and conduct bias analysis, a dataset with balanced annotations in terms of gender, language, and age groups is required. To solve these problems, the author created a dataset named STraDa (Singer Traits Dataset), which contains two subsets: - **automatic - strada**: A large - scale subset created automatically, containing 25,000 songs from more than 5,000 unique lead - singers, covering multiple genres and languages, and providing cross - validated lead - singer metadata. - **annotated - strada**: A small - scale subset annotated manually, containing 200 songs, evenly distributed between two genders, five languages, and four age groups, ensuring the accuracy and representativeness of the subset. Through these two subsets, STraDa not only provides rich training data for singing - voice - related tasks but also provides reliable test data for model evaluation and bias analysis.

STraDa: A Singer Traits Dataset

GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks

CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection

Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing

DeepSinger: Singing Voice Synthesis with Data Mined From the Web

Singer Identity Representation Learning using Self-Supervised Techniques

ChoralSynth: Synthetic Dataset of Choral Singing

Constructing a Singing Style Caption Dataset

Sonos Voice Control Bias Assessment Dataset: A Methodology for Demographic Bias Assessment in Voice Assistants

AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment

StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis

Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion

Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus

DALI: a large Dataset of synchronized Audio, LyrIcs and notes, automatically created using teacher-student machine learning paradigm

SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction

Creating an A Cappella Singing Audio Dataset for Automatic Jingju Singing Evaluation Research

RMSSinger: Realistic-Music-Score based Singing Voice Synthesis

A Novel Framework for Efficient Automated Singer Identification in Large Music Databases

SingingHead: A Large-scale 4D Dataset for Singing Head Animation

A Comparative Study of Pitch Extraction Algorithms on a Large Variety of Singing Sounds

[Nighttime artificial light and disorders of the human internal clock (1)].