CENN: Capsule-enhanced neural network with innovative metrics for robust speech emotion recognition

Huiyun Zhang,Heming Huang,Puyang Zhao,Xiaojun Zhu,Zhenbao Yu
DOI: https://doi.org/10.1016/j.knosys.2024.112499
IF: 8.139
2024-09-13
Knowledge-Based Systems
Abstract:Speech emotion recognition (SER) plays a pivotal role in enhancing Human-computer interaction (HCI) systems. This paper introduces a groundbreaking Capsule-enhanced neural network (CENN) that significantly advances the state of SER through a robust and reproducible deep learning framework. The CENN architecture seamlessly integrates advanced components, including Multi-head attention (MHA), residual module, and capsule module, which collectively enhance the model's capacity to capture both global and local features essential for precise emotion classification. A key contribution of this work is the development of a comprehensive reproducibility framework, featuring novel metrics: General learning reproducibility (GLR) and Correct learning reproducibility (CLR). These metrics, alongside their fractional and perfect variants, offer a multi-dimensional evaluation of the model's consistency and correctness across multiple executions, thereby ensuring the reliability and credibility of the results. To tackle the persistent challenge of overfitting in deep learning models, we propose an innovative overfitting metric that considers the intricate relationship between training and testing errors, model complexity, and data complexity. This metric, in conjunction with the newly introduced generalization and robustness metrics, provides a holistic assessment of the model's performance, guiding the application of regularization techniques to maintain generalizability and resilience. Extensive experiments conducted on benchmark SER datasets demonstrate that the CENN model not only surpasses existing approaches in terms of accuracy but also sets a new benchmark in reproducibility. This work establishes a new paradigm for deep learning model development in SER, underscoring the vital importance of reproducibility and offering a rigorous framework for future research.
computer science, artificial intelligence
What problem does this paper attempt to address?