Abstract:The weight-adapted convolution neural network (WACNN) is proposed to extract discriminative expression representations for recognizing facial expression. It aims to make good use of the convolution neural network's (CNN's) potential performance in avoiding local optima and speeding up convergence by the hybrid genetic algorithm (HGA) with optimal initial population, in such a way that it realizes deep and global emotion understanding in human-robot interaction. Moreover, the idea of novelty search is introduced to solve the deception problem in the HGA, which can expend the search space to help genetic algorithm jump out of local optimum and optimize large-scale parameters. In the proposal, the facial expression image preprocessing is conducted first, then the low-level expression features are extracted by using a principal component analysis. Finally, the high-level expression semantic features are extracted and recognized by WACNN which is optimized by HGA. In order to evaluate the effectiveness of WACNN, experiments on JAFFE, CK+, and static facial expressions in the wild 2.0 databases are carried out by using k-fold cross validation, and experimental results show the recognition accuracies of the proposal are superior to that of the state-of-the-art, such as local directional ternary pattern and weighted mixture deep neural network (DNN), which aim to extract discriminative and are the DNN-based methods. Moreover, recognition accuracies of the proposal are also higher than the deep CNN without HGA, which indicates that the proposal has better global optimization ability. Meanwhile, preliminary application experiments are also carried out by using the proposed algorithm on the emotional social robot system, where nine volunteers and two-wheeled robots experience the scenario of emotion understanding. Application results indicate that the wheeled robots can recognize basic expressions, such as happy, surprise, and so on.

LDNN: Linguistic Knowledge Injectable Deep Neural Network for Group Cohesiveness Understanding

Visualizing and Understanding Neural Models in NLP

Weight-Adapted Convolution Neural Network for Facial Expression Recognition in Human–Robot Interaction

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning

LLaVA-KD: A Framework of Distilling Multimodal Large Language Models

CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding

Predicting Group Cohesiveness in Images

From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models

InfMLLM: A Unified Framework for Visual-Language Tasks.

LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge

Collaboration Analysis Using Deep Learning

Towards More Unified In-context Visual Understanding

Low Level Descriptors Based DBLSTM Bottleneck Feature for Speech Driven Talking Avatar

Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models

Describe-and-Dissect: Interpreting Neurons in Vision Networks with Language Models

A Cohesive Distillation Architecture for Neural Language Models

Enhancing Image Description Generation through Deep Reinforcement Learning: Fusing Multiple Visual Features and Reward Mechanisms

HVLM: Exploring Human-Like Visual Cognition and Language-Memory Network for Visual Dialog

Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models

CoVis: A Collaborative Framework for Fine-grained Graphic Visual Understanding