Toward a Dialogue System Using a Large Language Model to Recognize User Emotions with a Camera

Hiroki Tanioka,Tetsushi Ueta,Masahiko Sano
2024-08-15
Abstract:The performance of ChatGPT© and other LLMs has improved tremendously, and in online environments, they are increasingly likely to be used in a wide variety of situations, such as ChatBot on web pages, call center operations using voice interaction, and dialogue functions using agents. In the offline environment, multimodal dialogue functions are also being realized, such as guidance by Artificial Intelligence agents (AI agents) using tablet terminals and dialogue systems in the form of LLMs mounted on robots. In this multimodal dialogue, mutual emotion recognition between the AI and the user will become important. So far, there have been methods for expressing emotions on the part of the AI agent or for recognizing them using textual or voice information of the user's utterances, but methods for AI agents to recognize emotions from the user's facial expressions have not been studied. In this study, we examined whether or not LLM-based AI agents can interact with users according to their emotional states by capturing the user in dialogue with a camera, recognizing emotions from facial expressions, and adding such emotion information to prompts. The results confirmed that AI agents can have conversations according to the emotional state for emotional states with relatively high scores, such as Happy and Angry.
Human-Computer Interaction,Artificial Intelligence,Robotics
What problem does this paper attempt to address?
This paper aims to solve the problem of how to achieve human - machine emotion recognition in the dialogue system through large - language models (LLMs). Specifically, the researchers explored whether it is possible to capture users' facial expressions through cameras and add these emotional information in JSON format to the dialogue prompts, enabling AI agents based on LLMs to interact according to the users' emotional states. Previous studies mainly focused on the methods for AI agents to express emotions or recognize users' emotions through text and voice information, but rarely involved the methods of recognizing emotions from users' facial expressions. Therefore, this study fills this gap and verifies the effectiveness of adding emotional information in the dialogue through experiments, especially in dealing with emotional states such as "happiness" and "anger". The specific methods mentioned in the paper include using the Python library FER to recognize facial expressions and combining these emotional information with natural - language dialogue content to form complete dialogue prompts. The experimental results show that when users show different emotional states, AI agents can adjust their response methods accordingly, for example, expressing happiness to smiling users and giving care and support to angry or sad users. This indicates that by adding emotional information, the interactive experience and effectiveness of the dialogue system can be significantly improved.