GVT2RPM: An Empirical Study for General Video Transformer Adaptation to Remote Physiological Measurement

Hao Wang,Euijoon Ahn,Jinman Kim
2024-06-19
Abstract:Remote physiological measurement (RPM) is an essential tool for healthcare monitoring as it enables the measurement of physiological signs, e.g., heart rate, in a remote setting via physical wearables. Recently, with facial videos, we have seen rapid advancements in video-based RPMs. However, adopting facial videos for RPM in the clinical setting largely depends on the accuracy and robustness (work across patient populations). Fortunately, the capability of the state-of-the-art transformer architecture in general (natural) video understanding has resulted in marked improvements and has been translated to facial understanding, including RPM. However, existing RPM methods usually need RPM-specific modules, e.g., temporal difference convolution and handcrafted feature maps. Although these customized modules can increase accuracy, they are not demonstrated for their robustness across datasets. Further, due to their customization of the transformer architecture, they cannot use the advancements made in general video transformers (GVT). In this study, we interrogate the GVT architecture and empirically analyze how the training designs, i.e., data pre-processing and network configurations, affect the model performance applied to RPM. Based on the structure of video transformers, we propose to configure its spatiotemporal hierarchy to align with the dense temporal information needed in RPM for signal feature extraction. We define several practical guidelines and gradually adapt GVTs for RPM without introducing RPM-specific modules. Our experiments demonstrate favorable results to existing RPM-specific module counterparts. We conducted extensive experiments with five datasets using intra-dataset and cross-dataset settings. We highlight that the proposed guidelines GVT2RPM can be generalized to any video transformers and is robust to various datasets.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of how to adapt General Video Transformers (GVTs) to Remote Physiological Measurement (RPM). Specifically, the goals of the paper are: 1. **Improve the accuracy and robustness of RPM**: Existing RPM methods usually rely on specific modules (such as time - difference convolution and hand - designed feature maps). Although these modules can improve accuracy, they lack robustness across datasets and cannot fully utilize the latest advances in general video transformers. 2. **Avoid customized modules**: By directly using the general video transformer architecture to handle RPM tasks without introducing RPM - specific modules, thus maintaining the generalization ability and portability of the model. 3. **Explore the optimal configuration**: Through empirical research, propose a series of practical guidelines to optimize the performance of GVTs in RPM tasks. This includes specific adjustments in data pre - processing, network configuration, etc., to ensure the stability and efficiency of the model on different datasets. ### Research background - **Remote Physiological Measurement (RPM)**: RPM is an important medical monitoring tool that can measure physiological signals such as heart rate in a non - contact manner (such as facial videos). - **Existing challenges**: Existing RPM methods usually require specific modules to enhance time - signal extraction, but these modules are difficult to generalize to different datasets and video transformer architectures. - **Advantages of General Video Transformers (GVTs)**: GVTs perform well in natural video understanding, have a large receptive field and better long - range dependency modeling ability, and are also theoretically suitable for RPM tasks. ### Solution The paper proposes a method named GVT2RPM, which can be effectively applied to RPM tasks by making appropriate adjustments to general video transformers. Specific measures include: - **Data pre - processing**: Adjust the input dimension, output format, frame format and signal normalization. - **Network configuration**: Select appropriate position encoding and scaling strategies to better capture temporal and spatial information. Through these adjustments, GVT2RPM can achieve better results than existing RPM - specific modules on multiple public datasets and shows good cross - dataset robustness. ### Experimental results The paper verifies the effectiveness of GVT2RPM through extensive experiments, including single - dataset experiments and cross - dataset experiments on five commonly used public datasets. The results show that GVT2RPM not only outperforms existing methods in performance but also shows strong generalization ability on different datasets. ### Summary This paper proves through empirical research that general video transformers can be directly applied to remote physiological measurement tasks and proposes specific optimization guidelines, providing valuable references for future research.