Global-Local-Feature-Fused Driver Speech Emotion Detection for Intelligent Cockpit in Automated Driving.

Wenbo Li,Jiyong Xue,Ruichen Tan,Cong Wang,Zejian Deng,Shen Li,Gang Guo,Dongpu Cao
DOI: https://doi.org/10.1109/tiv.2023.3259988
IF: 8.2
2023-01-01
IEEE Transactions on Intelligent Vehicles
Abstract:Affective interaction between the intelligent cockpit and humans is becoming an emerging topic full of opportunities. Robust recognition of the driver's emotions is the first step for affective interaction, and the intelligent cockpit recognizes emotions through the driver's speech, which has a wide range of technical application potential. In this paper, we first proposed a multi-feature fusion parallel structure speech emotion recognition network, which complementarily fuses the global acoustic features and local spectral features of the entire speech. Second, we designed and conducted the speech data collection under the driver's emotion and established the driver's speech emotion (SpeechEmo) dataset in the dynamic driving environment including 40 participants. Finally, the proposed model was validated on the SpeechEmo and public datasets, and quantitative analysis was carried out. It was found that the proposed model achieved advanced recognition performance, and the ablation experiments verified the importance of different components of the model. The proposed model and dataset are beneficial to the realization of human-vehicle affective interaction in intelligent cockpits in the future toward a better human experience.
What problem does this paper attempt to address?