CalibrationPhys: Self-supervised Video-based Heart and Respiratory Rate Measurements by Calibrating Between Multiple Cameras

Yusuke Akamatsu,Terumi Umematsu,Hitoshi Imaoka
DOI: https://doi.org/10.1109/JBHI.2023.3345486
2024-01-11
Abstract:Video-based heart and respiratory rate measurements using facial videos are more useful and user-friendly than traditional contact-based sensors. However, most of the current deep learning approaches require ground-truth pulse and respiratory waves for model training, which are expensive to collect. In this paper, we propose CalibrationPhys, a self-supervised video-based heart and respiratory rate measurement method that calibrates between multiple cameras. CalibrationPhys trains deep learning models without supervised labels by using facial videos captured simultaneously by multiple cameras. Contrastive learning is performed so that the pulse and respiratory waves predicted from the synchronized videos using multiple cameras are positive and those from different videos are negative. CalibrationPhys also improves the robustness of the models by means of a data augmentation technique and successfully leverages a pre-trained model for a particular camera. Experimental results utilizing two datasets demonstrate that CalibrationPhys outperforms state-of-the-art heart and respiratory rate measurement methods. Since we optimize camera-specific models using only videos from multiple cameras, our approach makes it easy to use arbitrary cameras for heart and respiratory rate measurements.
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the problem of how to avoid using expensive and hard-to-obtain supervised labels (such as pulse waves and respiratory waves) when measuring heart rate (HR) and respiratory rate (RR) across different cameras, thereby achieving self-supervised heart rate and respiratory rate measurement. Specifically, the paper proposes a method called CalibrationPhys, which calibrates between multiple cameras and uses contrastive learning to train a deep learning model without relying on supervised labels. ### Main Issues 1. **Avoiding the use of supervised labels**: Traditional deep learning-based heart rate and respiratory rate measurement methods require a large number of supervised labels, which are usually collected through contact sensors (such as pulse sensors and respiratory belts), making them expensive and time-consuming. 2. **Adapting to different cameras**: Different cameras can cause model performance differences due to variations in color and internal image processing. When changing cameras, it is usually necessary to re-collect facial videos and supervised labels to train specific deep learning models, which is very time-consuming and labor-intensive. ### Solution - **Self-supervised learning**: CalibrationPhys utilizes facial videos captured simultaneously by multiple cameras for self-supervised learning. It treats the heart rate and respiratory rate waveforms predicted from synchronized videos as positive samples and those predicted from different videos as negative samples through contrastive learning. - **Data augmentation**: A temporal augmentation technique is introduced to expand the variations in heart rate and respiratory rate in the training dataset by upsampling or downsampling, thereby improving the model's robustness. - **Pre-trained model**: If a heart rate or respiratory rate estimation model has already been trained for a certain camera, it can be used as a pre-trained model. The pre-trained model is fixed, and only the model for the new application camera is trained, achieving domain adaptation. ### Experimental Results - **Performance superior to existing methods**: Experimental results show that CalibrationPhys outperforms existing heart rate and respiratory rate estimation methods on two datasets. - **Low computational cost**: Compared to traditional 3D convolutional neural networks, the 2D convolutional neural network used by CalibrationPhys has a lower computational cost, making it more suitable for running on resource-limited devices (such as smartphones). ### Summary CalibrationPhys addresses the dependency on supervised labels when measuring heart rate and respiratory rate across different cameras through self-supervised learning and contrastive learning. It improves the robustness and adaptability of the model, making it easier to use with any camera.