Abstract:Facial expression recognition (FER) is a critical task in multimedia with significant implications across various domains. However, analyzing the causes of facial expressions is essential for accurately recognizing them. Current approaches, such as those based on facial action units (AUs), typically provide AU names and intensities but lack insight into the interactions and relationships between AUs and the overall expression. In this paper, we propose a novel method called ExpLLM, which leverages large language models to generate an accurate chain of thought (CoT) for facial expression recognition. Specifically, we have designed the CoT mechanism from three key perspectives: key observations, overall emotional interpretation, and conclusion. The key observations describe the AU's name, intensity, and associated emotions. The overall emotional interpretation provides an analysis based on multiple AUs and their interactions, identifying the dominant emotions and their relationships. Finally, the conclusion presents the final expression label derived from the preceding analysis. Furthermore, we also introduce the Exp-CoT Engine, designed to construct this expression CoT and generate instruction-description data for training our ExpLLM. Extensive experiments on the RAF-DB and AffectNet datasets demonstrate that ExpLLM outperforms current state-of-the-art FER methods. ExpLLM also surpasses the latest GPT-4o in expression CoT generation, particularly in recognizing micro-expressions where GPT-4o frequently fails.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address several key issues in Facial Expression Recognition (FER): 1. **Lack of in-depth analysis of expression causes**: Existing facial expression recognition methods typically only provide the names and intensities of Action Units (AUs) but lack a deep understanding of the interactions between these AUs and their relationship to the overall expression. 2. **Insufficient transparency and interpretability**: Current methods often lack an intuitive and transparent reasoning process when generating facial expression recognition results, making the results difficult to explain and verify. 3. **Challenges in micro-expression recognition**: Existing methods perform poorly in recognizing subtle expressions (such as micro-expressions), especially when dealing with complex emotional expressions. To address these issues, the paper proposes a new method—**ExpLLM** (Expression Large Language Model), which utilizes large language models to generate a detailed "Chain of Thought" (CoT) to provide step-by-step analysis of facial expressions. Specifically, ExpLLM constructs this chain of thought from three key perspectives: - **Key Observations**: Describes the name, intensity, and related emotions of each AU. - **Overall Emotional Interpretation**: Analyzes the dominant emotions and their relationships based on multiple AUs and their interactions. - **Conclusion**: Derives the final expression label based on the above analysis. Additionally, the paper introduces the **Exp-CoT Engine** to construct this expression chain of thought and generate instruction-description data pairs for training ExpLLM. Through this method, ExpLLM not only improves the accuracy of facial expression recognition but also generates detailed and logically coherent facial expression descriptions. ### Main Contributions 1. **Innovative Model**: Proposes ExpLLM, providing an intuitive and step-by-step facial expression analysis method, enhancing the transparency and interpretability of FER. 2. **Data Construction Engine**: Designs the Exp-CoT Engine to generate high-quality instruction-description data pairs, ensuring the model can accurately generate the chain of thought for facial expressions. 3. **Experimental Validation**: Conducts extensive experiments on multiple FER datasets, demonstrating the superiority of ExpLLM in accuracy and generating high-quality expression descriptions, particularly outperforming the latest GPT-4o model in recognizing micro-expressions.

ExpLLM: Towards Chain of Thought for Facial Expression Recognition

Cgan Based Facial Expression Recognition for Human-Robot Interaction

DR-FER: Discriminative and Robust Representation Learning for Facial Expression Recognition

Combining 2D Gabor and Local Binary Pattern for Facial Expression Recognition Using Extreme Learning Machine

EMO-LLaMA: Enhancing Facial Emotion Understanding with Instruction Tuning

Efficient Facial Expression Recognition with Representation Reinforcement Network and Transfer Self-Training for Human–Machine Interaction

Enhanced Dual-Level Representations for Facial Expression Recognition

Facial expression recognition through multi-level features extraction and fusion

SAANet: Siamese Action-Units Attention Network for Improving Dynamic Facial Expression Recognition

xLSTM-FER: Enhancing Student Expression Recognition with Extended Vision Long Short-Term Memory Network

Joint Expression Synthesis and Representation Learning for Facial Expression Recognition

Compound facial expressions recognition approach using DCGAN and CNN

Understanding Naturalistic Facial Expressions with Deep Learning and Multimodal Large Language Models

Facial Expression Recognition Based on Facial Action Unit

AU-Oriented Expression Decomposition Learning for Facial Expression Recognition.

The Relationship Between the Three‐Dimensional (3D) Structures of BF Molecules and MHC‐Related Marek's Disease Resistance in Chickens

Adaptively Learning Facial Expression Representation via C-F Labels and Distillation

Privileged Attribution Constrained Deep Networks for Facial Expression Recognition

Label distribution learning for compound facial expression recognition in‐the‐wild: A comparative study

Evaluation and analysis of visual perception using attention-enhanced computation in multimedia affective computing