Twins-PainViT: Towards a Modality-Agnostic Vision Transformer Framework for Multimodal Automatic Pain Assessment using Facial Videos and fNIRS

Stefanos Gkikas,Manolis Tsiknakis
2024-07-29
Abstract:Automatic pain assessment plays a critical role for advancing healthcare and optimizing pain management strategies. This study has been submitted to the First Multimodal Sensing Grand Challenge for Next-Gen Pain Assessment (AI4PAIN). The proposed multimodal framework utilizes facial videos and fNIRS and presents a modality-agnostic approach, alleviating the need for domain-specific models. Employing a dual ViT configuration and adopting waveform representations for the fNIRS, as well as for the extracted embeddings from the two modalities, demonstrate the efficacy of the proposed method, achieving an accuracy of 46.76% in the multilevel pain assessment task.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is automatic pain assessment, aiming to achieve accurate pain assessment through multimodal data (facial videos and functional near - infrared spectroscopy fNIRS). Specifically, the research objectives include: 1. **Improve the accuracy of pain assessment**: Use facial expressions and changes in cerebral blood oxygen levels to more accurately assess the patient's pain level. 2. **Develop a modality - independent method**: Reduce the need for domain - specific models, enabling the system to adapt to different types of input data, such as videos and fNIRS signals. 3. **Optimize pain management strategies**: Through automated pain assessment, help doctors better understand the patient's pain state, thereby optimizing treatment plans. ### Background and Problem Description of the Paper Pain is a complex physiological and psychological phenomenon. The International Association for the Study of Pain (IASP) defines it as "an unpleasant sensory and emotional experience associated with actual or potential tissue damage". Pain not only affects an individual's quality of life but also brings serious social and economic burdens. Therefore, effective pain assessment is crucial for early diagnosis, disease monitoring, and treatment effect evaluation, especially in chronic pain management. However, pain assessment faces many challenges: - **Patient communication barriers**: Some patients (such as the elderly or those with limited language expression ability) have difficulty clearly describing their pain feelings. - **Diversity of pain manifestations**: There are significant differences in pain expression among people of different genders and age groups, increasing the complexity of assessment. - **Multimodal data fusion**: How to effectively integrate data from different sources (such as facial expressions, physiological signals, etc.) to improve assessment accuracy. ### Proposed Solutions To address the above challenges, this study proposes a multimodal framework Twins - PainViT based on a dual - Vision Transformer (ViT) configuration. The main features of this framework include: - **Modality - independence**: By uniformly representing the input data as two - dimensional waveform graphs, the need for domain - specific models is eliminated. - **Efficient feature extraction and fusion**: Use deep - learning techniques to extract features from facial videos and fNIRS signals and perform effective fusion. - **Pre - training and multi - task learning**: Pre - train the model through a multi - task learning strategy, improving the model's generalization ability and robustness. Finally, this method achieved an accuracy rate of 46.76% in the multi - level pain assessment task, demonstrating its potential in automatic pain assessment.