KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario

Huali Zhou,Yuke Lin,Dong Liu,Ming Li
2024-08-21
Abstract:This work aims to promote Chinese opera research in both musical and speech domains, with a primary focus on overcoming the data limitations. We introduce KunquDB, a relatively large-scale, well-annotated audio-visual dataset comprising 339 speakers and 128 hours of content. Originating from the Kunqu Opera Art Canon (Kunqu yishu dadian), KunquDB is meticulously structured by dialogue lines, providing explicit annotations including character names, speaker names, gender information, vocal manner classifications, and accompanied by preliminary text transcriptions. KunquDB provides a versatile foundation for role-centric acoustic studies and advancements in speech-related research, including Automatic Speaker Verification (ASV). Beyond enriching opera research, this dataset bridges the gap between artistic expression and technological innovation. Pioneering the exploration of ASV in Chinese opera, we construct four test trials considering two distinct vocal manners in opera voices: stage speech (ST) and singing (S). Implementing domain adaptation methods effectively mitigates domain mismatches induced by these vocal manner variations while there is still room for further improvement as a benchmark.
Audio and Speech Processing,Sound,Image and Video Processing
What problem does this paper attempt to address?
The paper aims to address the issue of data limitations in the fields of music and speech within Chinese opera research. Specifically, the paper introduces KunquDB, a large-scale, meticulously annotated audiovisual dataset that includes 339 speakers and 128 hours of content. These contents are derived from the "Kunqu Opera Art Collection" and have been carefully structured to provide clear annotations, including role names, speaker names, gender information, pronunciation mode classification, and preliminary text transcriptions. KunquDB provides a flexible foundation for the development of role-centered acoustic research and speech-related studies, including automatic speaker verification (ASV). Additionally, this dataset bridges the gap between artistic expression and technological innovation. The paper explores the application of ASV in Chinese opera by constructing four test trials, considering two different pronunciation modes: stage dialogue (ST) and singing (S). By implementing domain adaptation methods, the issue of domain mismatch caused by different pronunciation modes is effectively alleviated, though there is still room for further improvement as a benchmark test. In short, the main goal of the paper is to promote the research of Chinese opera by creating a large-scale, high-quality dataset, particularly addressing the issue of data insufficiency, and providing a foundation for future research.