A Study on the Speech Timbre Space Based on Subjective Evaluation

Wei Zhao,Yue Lian,Zhongwen Tu,Qifeng Dou
DOI: https://doi.org/10.1109/icdsca53499.2021.9650323
2021-01-01
Abstract:This paper proposed a speech timbre space that can be used for objective evaluation, and a small and medium-sized speech timbre corpus based on expert screening is developed. First, we collected 32 vocabularies describing vibratory vocalizations by reviewing a large number of literatures (e.g., musical instruments and human voice, etc.). For the vocabularies with high similarity, the series category method and multidimensional scaling were used to filter and determine the dimensions of the speech timbre space that match the subjective listening perception. Next, we used k-means clustering to obtain the final 5-dimensional speech timbre space. Finally, we collected a large number of pure Chinese speech sounds from the Internet and conducted subjective evaluation experiments to obtain a corpus with timbre labels. The results of the subjective experiments show that the established speech timbre space can describe the speech in daily life. This timbre space can be subsequently put into deep learning to process the input audio with timbre labels.
What problem does this paper attempt to address?