Abstract:Speech reflects people's mental state and using a microphone sensor is a potential method for human-computer interaction. Speech recognition using this sensor is conducive to the diagnosis of mental illnesses. The gender difference of speakers affects the process of speech emotion recognition based on specific acoustic features, resulting in the decline of emotion recognition accuracy. Therefore, we believe that the accuracy of speech emotion recognition can be effectively improved by selecting different features of speech for emotion recognition based on the speech representations of different genders. In this paper, we propose a speech emotion recognition method based on gender classification. First, we use MLP to classify the original speech by gender. Second, based on the different acoustic features of male and female speech, we analyze the influence weights of multiple speech emotion features in male and female speech, and establish the optimal feature sets for male and female emotion recognition, respectively. Finally, we train and test CNN and BiLSTM, respectively, by using the male and the female speech emotion feature sets. The results show that the proposed emotion recognition models have an advantage in terms of average recognition accuracy compared with gender-mixed recognition models.

Speaker gender recognition based on combining the contribution of MFCC and pitch features

Combining Mfcc And Pitch To Enhance The Performance Of The Gender Recognition

Simplified Deformation Compensation for Emotional Speaker Recognition

Pitch envelope based frame level score reweighed algorithm for emotion robust speaker recognition.

Cost-Sensitive Learning for Emotion Robust Speaker Recognition

Emotional speaker recognition based on similar neighbor phenomenon

Scores Selection for Emotional Speaker Recognition

Design and implementation of a speaker recognition system

Emotional Speaker Identification By Humans And Machines

Emotion-State conversion for speaker recognition

Emotional Speech Clustering Based Robust Speaker Recognition System

Learning Virtual HD Model for Bi-model Emotional Speaker Recognition

Learning Polynomial Function Based Neutral-Emotion Gmm Transformation For Emotional Speaker Recognition

A Deep Learning Method Using Gender-Specific Features for Emotion Recognition

Mandarin Isolated Words Recognition Method Based on Pitch Contour

Gender-dependent Feature Extraction for Speaker Recognition

Speaker Recognition Using DMFCC over Telephone Channels

Speaker Age Recognition Based on Isolated Words by Using SVM

Gender Identification using MFCC for Telephone Applications - A Comparative Study

Using Cepstral and Prosodic Features for Chinese Accent Identification

Speaker-Independent Speech Emotion Recognition Based On Cnn-Blstm And Multiple Svms