Abstract:This paper demonstrates the effectiveness of a diversification mechanism for building a more robust multi-attention system in generic facial action analysis. While previous multi-attention (e.g., visual attention and self-attention) research on facial expression recognition (FER) and Action Unit (AU) detection have been thoroughly studied to focus on ”external attention diversification”, where attention branches localize different facial areas, we delve into the realm of ”internal attention diversification” and explore the impact of diverse attention patterns within the same Region of Interest (RoI). Our experiments reveal that variability in attention patterns significantly impacts model performance, indicating that unconstrained multi-attention plagued by redundancy and over-parameterization, leading to sub-optimal results. To tackle this issue, we propose a compact module that guides the model to achieve self-diversified multi-attention. Our method is applied to both CNN-based and Transformer-based models, benchmarked on popular databases such as BP4D and DISFA for AU detection, as well as CK+, MMI, BU-3DFE, and BP4D+ for facial expression recognition. We also evaluate the mechanism on Self-attention and Channel-wise attention designs for improving their adaptive capabilities in multi-modal feature fusion tasks. The multi-modal evaluation is conducted on BP4D, BP4D+, and our newly developed large-scale comprehensive emotion database BP4D++, which contains well-synchronized and aligned sensor modalities, addressing the scarcity of annotations and identities in human affective computing. We plan to release the new database to the research community, fostering further advancements in this field.

Multi-Head Attention with Disagreement Regularization

On the diversity of multi-head attention

Disagreement Matters: Exploring Internal Diversification for Redundant Attention in Generic Facial Action Analysis

Visual Agreement Regularized Training for Multi-Modal Machine Translation

Get The Point of My Utterance! Learning Towards Effective Responses with Multi-Head Attention Mechanism

Agreement-based Joint Training for Bidirectional Attention-based Neural Machine Translation.

Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling

Multi-agent Reinforcement Learning with Multi-head Attention

Towards Semantic Consistency: Dirichlet Energy Driven Robust Multi-Modal Entity Alignment

LongHeads: Multi-Head Attention is Secretly a Long Context Processor

MoH: Multi-Head Attention as Mixture-of-Head Attention

Attention With Sparsity Regularization for Neural Machine Translation and Summarization

Efficient Monotonic Multihead Attention

Joint Representation Learning of Cross-lingual Words and Entities via Attentive Distant Supervision

Cross-Modal Attention Consistency for Video-Audio Unsupervised Learning

Generating Diverse Translation by Manipulating Multi-Head Attention

Multi-Head Attention: Collaborate Instead of Concatenate

Learn2Agree: Fitting with Multiple Annotators without Objective Ground Truth

ACTOR: Active Learning with Annotator-specific Classification Heads to Embrace Human Label Variation

Text-centric Alignment for Multi-Modality Learning

Rethinking the Multimodal Correlation of Multimodal Sequential Learning via Generalizable Attentional Results Alignment