Abstract:With the development of technology, the importance of the research on speech emotion recognition and semantic analysis has increased. The research is primarily applied in companion robot, technology products and medical purpose. In this research, a communication system with speech emotion recognition is proposed. The system pre-process speech with sound data enhancing method in speech emotion recognition and transform the sound into spectrogram by MFCC (Mel Frequency Cepstral Coefficient). Then, GoogLeNet of CNN (Convolutional Neural Network) is applied to recognize the five emotions, which are peace, happy, sad, angry and fear, and the top accuracy of recognition is 79.81%. When applying semantic analysis, the training texts are divided into two categories, positive and negative, and the chatting conversations are conducted in the framework Seq2Seq of RNN (Recurrent Neural Network). The systematic framework of this research has two parts, the client and the server. The former one is developed on Android system to be used in Application, and the latter one is established by Ubuntu Linux system and combined with the web server. With the bi-terminal framework system, the users can record voice in APP one his/her cellphone and upload the voice file to the server. Then, the voice undergoes speech emotion recognition by CNN and semantic analysis by RNN to function as a chatting machine that can respond positively or negatively based on the detected emotion and show the results on APP of the user's cell phone. The main contributions of this research are: 1) This study introduces the Chinese word vector to the robot dialogue system, effectively improving dialogue tolerance and semantic interpretation, 2) The traditional method of emotion identification must first tokenize the Chinese words, analyze the clauses and part of speech, and capture the emotional keywords before being interpreted by the expert system. Different from the traditional method, this study classifies the input directly through the convolutional neural network after the input sentence is converted into a spectrogram by MFCC, and 3) in addition to implementing the companion robot, the user's emotional index can be collected for analysis by the back-end care organization. In addition, compared with other commercial humanoid companion robots, this study is presented in an App, which is easier to use and economical.

Toward a Dialogue System Using a Large Language Model to Recognize User Emotions with a Camera

E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models

Modified Multi-scaled Retinex Using Chromaticity of Highlight Region for Correcting Color Distortion

FaceChat: An Emotion-Aware Face-to-face Dialogue Framework

Self-Emotion Blended Dialogue Generation in Social Simulation Agents

A Computational Approach to Emotion Recognition in Intelligent Agent

Affect Recognition in Conversations Using Large Language Models

Speak From Heart: An Emotion-Guided LLM-Based Multimodal Method for Emotional Dialogue Generation

A Multi-modal Eliza Using Natural Language Processing and Emotion Recognition

DialogueLLM: Context and Emotion Knowledge-Tuned Large Language Models for Emotion Recognition in Conversations

Study on emotion recognition and companion Chatbot using deep neural network

Building Emotional Support Chatbots in the Era of LLMs

Contextual Emotion Recognition using Large Vision Language Models

Building a Dialogue Corpus Annotated with Expressed and Experienced Emotions

Empathy Through Multimodality in Conversational Interfaces

Towards Context-Aware Facial Emotion Reaction Database for Dyadic Interaction Settings

Investigating Large Language Models' Perception of Emotion Using Appraisal Theory

Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction

EmoChat: Bringing Multimodal Emotion Detection to Mobile Conversation

Language Models as Emotional Classifiers for Textual Conversation

Integrating Natural Language Understanding and a Cognitive Approach to Textual Emotion Recognition for Generating Human-Like Responses