Energy Efficient Graph-Based Hybrid Learning for Speech Emotion Recognition on Humanoid Robot

Haowen Wu,Hanyue Xu,Kah Phooi Seng,Jieli Chen,Li Minn Ang

DOI: https://doi.org/10.3390/electronics13061151

IF: 2.9

2024-03-21

Electronics

Abstract:This paper presents a novel deep graph-based learning technique for speech emotion recognition which has been specifically tailored for energy efficient deployment within humanoid robots. Our methodology represents a fusion of scalable graph representations, rooted in the foundational principles of graph signal processing theories. By delving into the utilization of cycle or line graphs as fundamental constituents shaping a robust Graph Convolution Network (GCN)-based architecture, we propose an approach which allows the capture of relationships between speech signals to decode intricate emotional patterns and responses. Our methodology is validated and benchmarked against established databases such as IEMOCAP and MSP-IMPROV. Our model outperforms standard GCNs and prevalent deep graph architectures, demonstrating performance levels that align with state-of-the-art methodologies. Notably, our model achieves this feat while significantly reducing the number of learnable parameters, thereby increasing computational efficiency and bolstering its suitability for resource-constrained environments. This proposed energy-efficient graph-based hybrid learning methodology is applied towards multimodal emotion recognition within humanoid robots. Its capacity to deliver competitive performance while streamlining computational complexity and energy efficiency represents a novel approach in evolving emotion recognition systems, catering to diverse real-world applications where precision in emotion recognition within humanoid robots stands as a pivotal requisite.

engineering, electrical & electronic,computer science, information systems,physics, applied

What problem does this paper attempt to address?

The paper aims to address the following key issues: 1. **Improving Emotion Recognition Accuracy**: By proposing a hybrid learning method that combines Convolutional Neural Networks (CNN) and Graph Convolutional Networks (GCN) to enhance the performance of speech emotion recognition systems. This method aims to leverage the advantages of CNN in local feature extraction and the capability of GCN in capturing dependencies in time-series data. 2. **Achieving High Efficiency in Resource-Constrained Environments**: For resource-constrained application scenarios (such as humanoid robots), a power-efficient hybrid learning model is studied. This model reduces the number of required training parameters, thereby lowering computational complexity and energy consumption. 3. **Multimodal Emotion Recognition**: Although this paper primarily focuses on speech-based emotion recognition, it also lays the groundwork for extending this technology to multimodal emotion recognition in the future. This will enable robots to interact more naturally with humans and better understand human emotional states. In summary, the goal of this research is to reduce the demand for computational resources while ensuring high performance, particularly to enable the application of these advanced emotion recognition technologies in resource-constrained environments, such as practical application scenarios like humanoid robots.

Energy Efficient Graph-Based Hybrid Learning for Speech Emotion Recognition on Humanoid Robot

Cgan Based Facial Expression Recognition for Human-Robot Interaction

A Multi-Head Pseudo Nodes Based Spatial–temporal Graph Convolutional Network for Emotion Perception from GAIT

Self-attention Transfer Networks for Speech Emotion Recognition

A hybrid deep learning neural approach for emotion recognition from facial expressions for socially assistive robots

Multimodal Facial Emotion Recognition Using Improved Convolution Neural Networks Model

Speech Emotion Recognition Based on Temporal-Spatial Learnable Graph Convolutional Neural Network

A Facial Expression Emotion Recognition Based Human-robot Interaction System

Speech Emotion Recognition Based on Convolutional Neural Network with Attention-Based Bidirectional Long Short-Term Memory Network and Multi-Task Learning

Intelligent Facial Emotion Recognition and Semantic-Based Topic Detection for A Humanoid Robot

Optimized, robust, real-time emotion prediction for human-robot interactions using deep learning

Adaptive Speech Emotion Representation Learning Based On Dynamic Graph

EMOTION: Expressive Motion Sequence Generation for Humanoid Robots with In-Context Learning

Emotion Recognition in Conversation Based on a Dynamic Complementary Graph Convolutional Network

Graph Neural Network-Based Speech Emotion Recognition: A Fusion of Skip Graph Convolutional Networks and Graph Attention Networks

A Hybrid Time-Distributed Deep Neural Architecture for Speech Emotion Recognition

Emotion recognition models for companion robots

Data-driven emotional body language generation for social robotics

Multimodal Emotion Recognition based on Facial Expressions, Speech, and EEG

Speech emotion recognition via graph-based representations

Hyperparameters optimization for Deep Learning based emotion prediction for Human Robot Interaction