A survey of speech emotion recognition in natural environment

Shah Fahad,Ashish Ranjan,Jainath Yadav,Akshay Deepak,Md. Shah Fahad
DOI: https://doi.org/10.1016/j.dsp.2020.102951
IF: 2.92
2021-03-01
Digital Signal Processing
Abstract:While speech emotion recognition (SER) has been an active research field since the last three decades, the techniques that deal with the natural environment have only emerged in the last decade. These techniques have reduced the mismatch in the distribution of the training and testing data, which occurs due to the difference in speakers, texts, languages, and recording environments between the training and testing datasets. Although a few good surveys exist for SER, they either don't cover all aspects of SER in natural environments or don't discuss the specifics in detail. This survey focuses on SER in a natural environment, discussing SER techniques for natural environment along with their advantages and disadvantages in terms of speaker, text, language, and recording environments. In the recent past, the deep learning techniques have become very popular due to minimal speech processing and enhanced accuracy. Special attention has been given to deep-learning techniques and the related issues in this survey. Recent databases, features, and feature selection algorithms for SER, which have not been discussed in the existing surveys and can be promising for SER in a natural environment, have also been discussed in this paper.
What problem does this paper attempt to address?