Abstract:Automatic pain assessment plays a critical role for advancing healthcare and optimizing pain management strategies. This study has been submitted to the First Multimodal Sensing Grand Challenge for Next-Gen Pain Assessment (AI4PAIN). The proposed multimodal framework utilizes facial videos and fNIRS and presents a modality-agnostic approach, alleviating the need for domain-specific models. Employing a dual ViT configuration and adopting waveform representations for the fNIRS, as well as for the extracted embeddings from the two modalities, demonstrate the efficacy of the proposed method, achieving an accuracy of 46.76% in the multilevel pain assessment task.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is automatic pain assessment, aiming to achieve accurate pain assessment through multimodal data (facial videos and functional near - infrared spectroscopy fNIRS). Specifically, the research objectives include: 1. **Improve the accuracy of pain assessment**: Use facial expressions and changes in cerebral blood oxygen levels to more accurately assess the patient's pain level. 2. **Develop a modality - independent method**: Reduce the need for domain - specific models, enabling the system to adapt to different types of input data, such as videos and fNIRS signals. 3. **Optimize pain management strategies**: Through automated pain assessment, help doctors better understand the patient's pain state, thereby optimizing treatment plans. ### Background and Problem Description of the Paper Pain is a complex physiological and psychological phenomenon. The International Association for the Study of Pain (IASP) defines it as "an unpleasant sensory and emotional experience associated with actual or potential tissue damage". Pain not only affects an individual's quality of life but also brings serious social and economic burdens. Therefore, effective pain assessment is crucial for early diagnosis, disease monitoring, and treatment effect evaluation, especially in chronic pain management. However, pain assessment faces many challenges: - **Patient communication barriers**: Some patients (such as the elderly or those with limited language expression ability) have difficulty clearly describing their pain feelings. - **Diversity of pain manifestations**: There are significant differences in pain expression among people of different genders and age groups, increasing the complexity of assessment. - **Multimodal data fusion**: How to effectively integrate data from different sources (such as facial expressions, physiological signals, etc.) to improve assessment accuracy. ### Proposed Solutions To address the above challenges, this study proposes a multimodal framework Twins - PainViT based on a dual - Vision Transformer (ViT) configuration. The main features of this framework include: - **Modality - independence**: By uniformly representing the input data as two - dimensional waveform graphs, the need for domain - specific models is eliminated. - **Efficient feature extraction and fusion**: Use deep - learning techniques to extract features from facial videos and fNIRS signals and perform effective fusion. - **Pre - training and multi - task learning**: Pre - train the model through a multi - task learning strategy, improving the model's generalization ability and robustness. Finally, this method achieved an accuracy rate of 46.76% in the multi - level pain assessment task, demonstrating its potential in automatic pain assessment.

Twins-PainViT: Towards a Modality-Agnostic Vision Transformer Framework for Multimodal Automatic Pain Assessment using Facial Videos and fNIRS

Multimodal automatic assessment of acute pain through facial videos and heart rate signals utilizing transformer-based architectures

Synthetic Thermal and RGB Videos for Automatic Pain Assessment utilizing a Vision-MLP Architecture

Using sensor-fusion and machine-learning algorithms to assess acute pain in non-verbal infants: a study protocol

Multi-Modal Pain Intensity Assessment Based on Physiological Signals: A Deep Learning Perspective

Towards Synthetic Data Generation for Improved Pain Recognition in Videos under Patient Constraints

Multi-Modal Pain Intensity Recognition Based on the SenseEmotion Database

Multi-task Neural Networks for Personalized Pain Recognition from Physiological Signals

Pain Analysis using Adaptive Hierarchical Spatiotemporal Dynamic Imaging

Multi-task multiple kernel machines for personalized pain recognition from functional near-infrared spectroscopy brain signals

DeepFaceLIFT: Interpretable Personalized Models for Automatic Estimation of Self-Reported Pain

Two-Stream Attention Network for Pain Recognition from Video Sequences

An Automatic System for Continuous Pain Intensity Monitoring Based on Analyzing Data from Uni-, Bi-, and Multi-Modality

Multi-task Neural Networks for Pain Intensity Estimation using Electrocardiogram and Demographic Factors

Fusing Deep Learned and Hand-Crafted Features of Appearance, Shape, and Dynamics for Automatic Pain Estimation

Multimodal Affective State Assessment Using fNIRS + EEG and Spontaneous Facial Expression

Imaging the neural substrate of trigeminal neuralgia pain using deep learning

Multimodal physiological sensing for the assessment of acute pain

Advancing Multimodal Data Fusion in Pain Recognition: A Strategy Leveraging Statistical Correlation and Human-Centered Perspectives

Automatic Estimation of Self-Reported Pain by Interpretable Representations of Motion Dynamics

Automatic Estimation of Self-Reported Pain by Trajectory Analysis in the Manifold of Fixed Rank Positive Semi-Definite Matrices