Searching Audio-Visual Clips for Dual-mode Chinese Emotional Speech Database

Xudong Zhang,Guoqing Wu,Fuji Ren
DOI: https://doi.org/10.1109/ACIIAsia.2018.8470387
2018-01-01
Abstract:A widely accepted Chinese emotional speech database with abundant spontaneous speeches is essential to Chinese emotional speech recognition and affective computing. This paper presents a new method of constructing such a Chinese audio-visual spontaneous emotional speech database. The source materials come from a variety of videos in Chinese. The Voice Activity Detection technology is introduced to catch the sets of start time and end time of the syntactic boundaries in a dialogue. This times sets are helpful in following extracting processing to ensure reaching a complete phrase or sentence. Microsoft Emotion API is adopted to compute the confidence across a set of eight emotional states in frame-level from videos. A joint compression-discrimination algorithm is presented to detecting which clip would be accepted as the candidate and which emotion state it mostly be. Manual listening test and modification is implemented finally. The data analysis shows that the proposed method is feasible and effective.
What problem does this paper attempt to address?