Abstract:Emotion computing is a necessary part of advanced human–computer interaction. An appropriate description of a character's facial expressions, body languages, and speaking styles in novels always enables readers to infer the character's emotions. Moreover, multimodal information is complementary and integrated. Fusing the information from multiple modes into a textual modal can get better fusion results and overcome the bias of understanding the unimodal information. Inspired by these facts, we develop a novel emotion-aware method by the fusion of textual description of speech, body movements, and facial expression, which reduces the dimensionality of speech, body movements, and facial expressions by unifying three types of information into a unified component. Specifically, to fuse multimodel features for emotion recognition, we propose a two-stage neural network. First, bidirectional long short-term memory-conditional random fields (Bi-LSTM-CRF) and back-propagation neural network (BPNN) are used to analyze the extracted vocal and visual features of facial expressions, body movements, and speeches, which aims to obtain textual descriptions of different features. Second, the textual descriptions of the features are fused through a neural network with a self-organization map (SOM) layer and are used to compensate layers that are trained by web-based corpus. The advantages of this method are to utilize depth information to track facial and bodily movement, and employ an explainable textual intermediate representation to fuse the features. We experimentally tested the emotion-aware system in real-world applications, and the results indicate that our system can quickly and steadily recognize human emotions. Compared with other unimodal and multimodal-fusion algorithms, our method is more precise, which can improve the accuracy by up to 30% compared with the unimodal method.

A Novel Chinese Character Recognition Method Based on Multi-Modal Fusion.

CMCI: A Robust Multimodal Fusion Method for Spiking Neural Networks

Multi-Modal Fusion Emotion Recognition Method of Speech Expression Based on Deep Learning

Chinese Character Recognition Method Based On Multi-Features And Parallel Neural Network Computation

A novel multilevel stacked SqueezeNet model for handwritten Chinese character recognition

Reinforcement Learning Based Multi-modal Feature Fusion Network for Novel Class Discovery

MSFM: Multi-view Semantic Feature Fusion Model for Chinese Named Entity Recognition.

CMFN: Cross-Modal Fusion Network for Irregular Scene Text Recognition

Multi-modal fusion network guided by prior knowledge for 3D CAD model recognition

Multi-modal fusion network with complementarity and importance for emotion recognition

Multi-Channel Attentive Graph Convolutional Network with Sentiment Fusion for Multimodal Sentiment Analysis

A Novel Emotion-Aware Method Based on the Fusion of Textual Description of Speech, Body Movements, and Facial Expressions

Application of Multimodal Fusion Deep Learning Model in Disease Recognition

Dual-Branch Multitask Fusion Network for Offline Chinese Writer Identification.

A Novel Multimodal Fusion Network Based on a Joint Coding Model for Lane Line Segmentation

Constructing multi-modal emotion recognition model based on convolutional neural network

Multi-head attention fusion networks for multi-modal speech emotion recognition

A Novel Deep Multi-Modal Feature Fusion Method for Celebrity Video Identification

Adaptive information fusion network for multi‐modal personality recognition

Multi-modality Fusion Network for Action Recognition.

Chinese text classification based on attention mechanism and feature-enhanced fusion neural network