Abstract:With the development of technology, the importance of the research on speech emotion recognition and semantic analysis has increased. The research is primarily applied in companion robot, technology products and medical purpose. In this research, a communication system with speech emotion recognition is proposed. The system pre-process speech with sound data enhancing method in speech emotion recognition and transform the sound into spectrogram by MFCC (Mel Frequency Cepstral Coefficient). Then, GoogLeNet of CNN (Convolutional Neural Network) is applied to recognize the five emotions, which are peace, happy, sad, angry and fear, and the top accuracy of recognition is 79.81%. When applying semantic analysis, the training texts are divided into two categories, positive and negative, and the chatting conversations are conducted in the framework Seq2Seq of RNN (Recurrent Neural Network). The systematic framework of this research has two parts, the client and the server. The former one is developed on Android system to be used in Application, and the latter one is established by Ubuntu Linux system and combined with the web server. With the bi-terminal framework system, the users can record voice in APP one his/her cellphone and upload the voice file to the server. Then, the voice undergoes speech emotion recognition by CNN and semantic analysis by RNN to function as a chatting machine that can respond positively or negatively based on the detected emotion and show the results on APP of the user's cell phone. The main contributions of this research are: 1) This study introduces the Chinese word vector to the robot dialogue system, effectively improving dialogue tolerance and semantic interpretation, 2) The traditional method of emotion identification must first tokenize the Chinese words, analyze the clauses and part of speech, and capture the emotional keywords before being interpreted by the expert system. Different from the traditional method, this study classifies the input directly through the convolutional neural network after the input sentence is converted into a spectrogram by MFCC, and 3) in addition to implementing the companion robot, the user's emotional index can be collected for analysis by the back-end care organization. In addition, compared with other commercial humanoid companion robots, this study is presented in an App, which is easier to use and economical.

Inferring Emotion from Conversational Voice Data: A Semi-Supervised Multi-Path Generative Neural Network Approach.

Inferring Emotion from Large-scale Internet Voice Data: A Semi-supervised Curriculum Augmentation Based Deep Learning Approach

Emotion Inferring from Large-scale Internet Voice Data: A Multimodal Deep Learning Approach

Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition

Inferring Emotions from Large-Scale Internet Voice Data.

Self-attention Transfer Networks for Speech Emotion Recognition

Inferring Users' Emotions For Human-Mobile Voice Dialogue Applications

Inferring Emphasis for Real Voice Data: an Attentive Multimodal Neural Network Approach.

Speech Emotion Recognition by Combining a Unified First-Order Attention Network with Data Balance

Visual-Audio Emotion Recognition Based on Multi-Task and Ensemble Learning with Multiple Features

Learning Representations of Emotional Speech with Deep Convolutional Generative Adversarial Networks

Emphasis Detection for Voice Dialogue Applications Using Multi-channel Convolutional Bidirectional Long Short-Term Memory Network

Inferring User Emotive State Changes in Realistic Human-Computer Conversational Dialogs.

Learning to Infer Public Emotions from Large-Scale Networked Voice Data

Multimodal Deep Convolutional Neural Network for Audio-Visual Emotion Recognition.

MMDAG: Multimodal Directed Acyclic Graph Network for Emotion Recognition in Conversation

A Novel Dual-Modal Emotion Recognition Algorithm with Fusing Hybrid Features of Audio Signal and Speech Context

A New Network Structure for Speech Emotion Recognition Research

Study on emotion recognition and companion Chatbot using deep neural network

An autoencoder-based feature level fusion for speech emotion recognition

Combining cross-modal knowledge transfer and semi-supervised learning for speech emotion recognition