FruitsMusic: A Real-World Corpus of Japanese Idol-Group Songs

Hitoshi Suda,Shunsuke Yoshida,Tomohiko Nakamura,Satoru Fukayama,Jun Ogata
2024-09-19
Abstract:This study presents FruitsMusic, a metadata corpus of Japanese idol-group songs in the real world, precisely annotated with who sings what and when. Japanese idol-group songs, vital to Japanese pop culture, feature a unique vocal arrangement style, where songs are divided into several segments, and a specific individual or multiple singers are assigned to each segment. To enhance singer diarization methods for recognizing such structures, we constructed FruitsMusic as a resource using 40 music videos of Japanese idol groups from YouTube. The corpus includes detailed annotations, covering songs across various genres, division and assignment styles, and groups ranging from 4 to 9 members. FruitsMusic also facilitates the development of various music information retrieval techniques, such as lyrics transcription and singer identification, benefiting not only Japanese idol-group songs but also a wide range of songs featuring single or multiple singers from various cultures. This paper offers a comprehensive overview of FruitsMusic, including its creation methodology and unique characteristics compared to conversational speech. Additionally, this paper evaluates the efficacy of current methods for singer embedding extraction and diarization in challenging real-world conditions using FruitsMusic. Furthermore, this paper examines potential improvements in automatic diarization performance through evaluating human performance.
Sound,Audio and Speech Processing
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problems of singer identification and segmentation in Japanese idol group songs, especially by constructing a real - world corpus named **FruitsMusic** to improve the singer diarization technology. Specifically: 1. **Singer diarization problem**: Singer diarization refers to identifying "who is singing at what time" from the music signal. This task is crucial for understanding the structure and expression of idol group songs. Most of the existing research is based on virtual idol songs in games and anime, and these songs have a relatively single style and are easy to distinguish. However, idol group songs in the real world have more complex song divisions, so more challenging datasets are required for research. 2. **Lack of real - world datasets**: Existing research mainly relies on virtual idol songs, and there are significant differences between these songs and idol group songs in the real world. To fill this gap, the author constructed the **FruitsMusic** dataset, which contains real - world idol group songs from YouTube and is annotated in detail with which singers sing each segment. 3. **Multimodal information processing**: Idol group songs not only contain audio information but also involve multiple modal information such as video content and lyrics. Therefore, the design of the FruitsMusic dataset also takes into account the needs of multimodal processing, such as multimodal diarization. 4. **Application of Music Information Retrieval (MIR) technology**: In addition to singer diarization, FruitsMusic can also be used to develop and evaluate other MIR technologies, such as lyrics transcription, emotion classification, singer identification, etc. This helps to improve the understanding and processing ability of single - or multi - person - sung songs in various cultural and linguistic backgrounds. ### Characteristics of the FruitsMusic dataset - **Real - world data**: FruitsMusic contains 40 real - world idol group songs from YouTube, covering different styles and song division methods. - **Detailed annotation**: Each song is annotated in detail with which singers sing each segment, as well as the specific start and end times. - **Diversity and complexity**: The dataset covers idol groups with 4 to 9 members, ensuring the diversity and complexity of the data. - **Wide application**: It is not only applicable to Japanese idol group songs but can also be extended to multi - person - sung songs in other cultural backgrounds. ### Research methods 1. **Data collection and annotation**: 40 idol group songs were collected from YouTube, and the singer information of each segment was recorded by manual annotation. 2. **Model training and evaluation**: The FruitsMusic dataset was used to train and evaluate singer diarization models, including methods such as Self - Attention End - to - End Neural Diarization (SA - EEND) and pyannote.audio. 3. **Human evaluation**: A human evaluator was invited to perform manual diarization to evaluate the performance of the automatic system. ### Conclusion By constructing the FruitsMusic dataset, the author has successfully solved the problem of lack of real - world idol group song data in existing research and provided strong support for the development of singer diarization and other MIR technologies. The experimental results show that the pipeline system combining source separation and diarization performs better when dealing with complex real - world songs, and the introduction of the FruitsMusic dataset significantly improves the model performance. --- If you have more specific questions about this paper or need further interpretation, please feel free to let us know!