Abstract:With the development of technology, the importance of the research on speech emotion recognition and semantic analysis has increased. The research is primarily applied in companion robot, technology products and medical purpose. In this research, a communication system with speech emotion recognition is proposed. The system pre-process speech with sound data enhancing method in speech emotion recognition and transform the sound into spectrogram by MFCC (Mel Frequency Cepstral Coefficient). Then, GoogLeNet of CNN (Convolutional Neural Network) is applied to recognize the five emotions, which are peace, happy, sad, angry and fear, and the top accuracy of recognition is 79.81%. When applying semantic analysis, the training texts are divided into two categories, positive and negative, and the chatting conversations are conducted in the framework Seq2Seq of RNN (Recurrent Neural Network). The systematic framework of this research has two parts, the client and the server. The former one is developed on Android system to be used in Application, and the latter one is established by Ubuntu Linux system and combined with the web server. With the bi-terminal framework system, the users can record voice in APP one his/her cellphone and upload the voice file to the server. Then, the voice undergoes speech emotion recognition by CNN and semantic analysis by RNN to function as a chatting machine that can respond positively or negatively based on the detected emotion and show the results on APP of the user's cell phone. The main contributions of this research are: 1) This study introduces the Chinese word vector to the robot dialogue system, effectively improving dialogue tolerance and semantic interpretation, 2) The traditional method of emotion identification must first tokenize the Chinese words, analyze the clauses and part of speech, and capture the emotional keywords before being interpreted by the expert system. Different from the traditional method, this study classifies the input directly through the convolutional neural network after the input sentence is converted into a spectrogram by MFCC, and 3) in addition to implementing the companion robot, the user's emotional index can be collected for analysis by the back-end care organization. In addition, compared with other commercial humanoid companion robots, this study is presented in an App, which is easier to use and economical.

Inferring Emotions from Large-Scale Internet Voice Data.

Emotion Inferring from Large-scale Internet Voice Data: A Multimodal Deep Learning Approach

Inferring Users' Emotions For Human-Mobile Voice Dialogue Applications

Inferring Emotion from Large-scale Internet Voice Data: A Semi-supervised Curriculum Augmentation Based Deep Learning Approach

Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition

Learning to Infer Public Emotions from Large-Scale Networked Voice Data

Self-attention Transfer Networks for Speech Emotion Recognition

Inferring Emotion from Conversational Voice Data: A Semi-Supervised Multi-Path Generative Neural Network Approach.

Acoustics, Content and Geo-Information Based Sentiment Prediction from Large-Scale Networked Voice Data

Inferring Emphasis for Real Voice Data: an Attentive Multimodal Neural Network Approach.

Speaker-Independent Speech Emotion Recognition Based On Cnn-Blstm And Multiple Svms

Real-time Speech Emotion Recognition Based on Syllable-Level Feature Extraction

Research on Chinese Speech Emotion Recognition Based on Deep Neural Network and Acoustic Features

Speech Emotion Recognition by Combining a Unified First-Order Attention Network with Data Balance

Speech Emotion Recognition Based on Syllable-Level Feature Extraction

Emphasis Detection for Voice Dialogue Applications Using Multi-channel Convolutional Bidirectional Long Short-Term Memory Network

Emotion Detection from Speech to Enrich Multimedia Content

Study on emotion recognition and companion Chatbot using deep neural network

Inferring User Emotive State Changes in Realistic Human-Computer Conversational Dialogs.

Design of smart home system speech emotion recognition model based on ensemble deep learning and feature fusion

Deep Learning and SVM-based Emotion Recognition from Chinese Speech for Smart Affective Services