STraDa: A Singer Traits Dataset

Yuexuan Kong,Viet-Anh Tran,Romain Hennequin
2024-06-06
Abstract:There is a limited amount of large-scale public datasets that contain downloadable music audio files and rich lead singer metadata. To provide such a dataset to benefit research in singing voices, we created Singer Traits Dataset (STraDa) with two subsets: automatic-strada and annotated-strada. The automatic-strada contains twenty-five thousand tracks across numerous genres and languages of more than five thousand unique lead singers, which includes cross-validated lead singer metadata as well as other track metadata. The annotated-strada consists of two hundred tracks that are balanced in terms of 2 genders, 5 languages, and 4 age groups. To show its use for model training and bias analysis thanks to its metadata's richness and downloadable audio files, we benchmarked singer sex classification (SSC) and conducted bias analysis.
Sound,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the current lack of large - scale, public datasets that contain downloadable music audio files and rich lead - singer metadata, especially in the field of singing voice research. Specifically: 1. **Insufficient scale and diversity of datasets**: Existing public datasets are usually small in scale and cover limited music genres and languages, unable to meet the requirements of large - scale machine - learning model training. 2. **Unclear annotation of lead - singer information**: In many existing datasets, the information about lead - singers is not clearly annotated. Especially when there are multiple lead - singers in a song, it is easy to lead to annotation errors. 3. **Lack of balanced subsets for evaluation and bias analysis**: In order to better evaluate model performance and conduct bias analysis, a dataset with balanced annotations in terms of gender, language, and age groups is required. To solve these problems, the author created a dataset named STraDa (Singer Traits Dataset), which contains two subsets: - **automatic - strada**: A large - scale subset created automatically, containing 25,000 songs from more than 5,000 unique lead - singers, covering multiple genres and languages, and providing cross - validated lead - singer metadata. - **annotated - strada**: A small - scale subset annotated manually, containing 200 songs, evenly distributed between two genders, five languages, and four age groups, ensuring the accuracy and representativeness of the subset. Through these two subsets, STraDa not only provides rich training data for singing - voice - related tasks but also provides reliable test data for model evaluation and bias analysis.