Abstract:International Journal of Pattern Recognition and Artificial Intelligence, Ahead of Print. Emotion recognition is an acceptable task of understanding the other's emotions and thoughts. Modern technology allows machines to recognize objects without the need for human intervention. The existing emotion recognition system faces more difficulties in making an accurate result with limited audio files. To address this problem, a Bag of audio terms-based hybrid deep learning models will be introduced it is known as the pioneering deep learning model. Input voice data is considered from a large dataset and pre-processed using a Data normalization and adaptive bilinear filtering approach. Afterward, acoustic features are taken out from the voice signals to capture related information for emotion recognition. These features can include linear prediction coefficients (LPC), three-dimensional (3D) log-mel spectrum, mel-frequency cepstral coefficients (MFCCs), and Prosodic features. Subsequently, feature selection is performed using an improved wild horse optimization (WHO) approach. Finally, a hybrid capsule slime mould dense deep learning framework (HCSDN) is used for voice-based emotion recognition. IEMOCAP and EMODB datasets are used to calculate system performance. The performance metrics denote the proposed system achieves 96.78% accuracy, 96.45% specificity, 95.81% precision, 4.256% error rate, and 94.256% sensitivity, 0.75% false positive rate in terms of the IEMOCAP dataset. Similarly, the proposed system achieves 96.85% accuracy, 95.74% specificity, 96.12% precision, 3.432% error rate, 95.25% sensitivity, and 0.62% false positive rate in terms of the EMODB dataset.

Indian EmoSpeech Command Dataset: A dataset for emotion based speech recognition in the wild

EmoInHindi: A Multi-label Emotion and Intensity Annotated Dataset in Hindi for Emotion Recognition in Dialogues

BANSpEmo: A Bangla Emotional Speech Recognition Dataset

E‐Speech: Development of a Dataset for Speech Emotion Recognition and Analysis

ADAM optimised human speech emotion recogniser based on statistical information distribution of chroma, MFCC, and MBSE features

A lightweight 2D CNN based approach for speaker-independent emotion recognition from speech with new Indian Emotional Speech Corpora

Speech and Text-Based Emotion Recognizer

Trends in speech emotion recognition: a comprehensive survey

Real-time Speech Emotion Recognition Based on Syllable-Level Feature Extraction

Emotion Recognition With Audio, Video, EEG, and EMG: A Dataset and Baseline Approaches

Speech emotion recognition with deep convolutional neural networks

The Indian Spontaneous Expression Database for Emotion Recognition

Bengali & Banglish: A monolingual dataset for emotion detection in linguistically diverse contexts

An Efficient Voice-Based Emotion Recognition Using Hybrid Capsule Slime Mould Dense Deep Learning Framework

EmoBone: A Multinational Audio Dataset of Emotional Bone Conducted Speech

Memotion 3: Dataset on Sentiment and Emotion Analysis of Codemixed Hindi-English Memes

EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative storytelling in games, television and graphic novels

A Comparative Analysis of Different Approach for Basic Emotions Recognition from Speech

Improved Speech Emotion Classification Using Deep Neural Network

Deep Learning Techniques for Speech Emotion Recognition: A Review

BEmoC: A Corpus for Identifying Emotion in Bengali Texts