Speech Emotion Recognition Based on Clustering Assistance

Zhi-Kun Peng,Zhen-Tao Liu,Meng-Ting Han
DOI: https://doi.org/10.1109/cac57257.2022.10054824
2022-01-01
Abstract:Speech emotion recognition (SER) is a key technology to achieve natural human-computer interaction. The development of SER is significantly influenced by the scale of the sample. In recent years, the study of SER has been intensified by introducing data augmentation methods. However, most of these methods directly augment the sample data scale, neglecting the rational analysis and utilization of the feature distribution of samples. In this paper, we propose a new framework for SER based on clustering assistance, which can utilize the feature distribution information of the sample data directly and effectively. It considers the sample proportion of each emotion category in the clusters obtained by clustering, converts it into a probability score, which is called the clustering emotion probability score, and fuses it with the emotion probability score from the simple classification model according to different fusion weighting factors. We evaluated the proposed method with the baseline model on the IEMOCAP dataset. Experimental results show that our method achieves better results than the baseline model in terms of both weighted and unweighted accuracy.
What problem does this paper attempt to address?