Abstract:Video-based heart and respiratory rate measurements using facial videos are more useful and user-friendly than traditional contact-based sensors. However, most of the current deep learning approaches require ground-truth pulse and respiratory waves for model training, which are expensive to collect. In this paper, we propose CalibrationPhys, a self-supervised video-based heart and respiratory rate measurement method that calibrates between multiple cameras. CalibrationPhys trains deep learning models without supervised labels by using facial videos captured simultaneously by multiple cameras. Contrastive learning is performed so that the pulse and respiratory waves predicted from the synchronized videos using multiple cameras are positive and those from different videos are negative. CalibrationPhys also improves the robustness of the models by means of a data augmentation technique and successfully leverages a pre-trained model for a particular camera. Experimental results utilizing two datasets demonstrate that CalibrationPhys outperforms state-of-the-art heart and respiratory rate measurement methods. Since we optimize camera-specific models using only videos from multiple cameras, our approach makes it easy to use arbitrary cameras for heart and respiratory rate measurements.

What problem does this paper attempt to address?

The paper attempts to address the problem of how to avoid using expensive and hard-to-obtain supervised labels (such as pulse waves and respiratory waves) when measuring heart rate (HR) and respiratory rate (RR) across different cameras, thereby achieving self-supervised heart rate and respiratory rate measurement. Specifically, the paper proposes a method called CalibrationPhys, which calibrates between multiple cameras and uses contrastive learning to train a deep learning model without relying on supervised labels. ### Main Issues 1. **Avoiding the use of supervised labels**: Traditional deep learning-based heart rate and respiratory rate measurement methods require a large number of supervised labels, which are usually collected through contact sensors (such as pulse sensors and respiratory belts), making them expensive and time-consuming. 2. **Adapting to different cameras**: Different cameras can cause model performance differences due to variations in color and internal image processing. When changing cameras, it is usually necessary to re-collect facial videos and supervised labels to train specific deep learning models, which is very time-consuming and labor-intensive. ### Solution - **Self-supervised learning**: CalibrationPhys utilizes facial videos captured simultaneously by multiple cameras for self-supervised learning. It treats the heart rate and respiratory rate waveforms predicted from synchronized videos as positive samples and those predicted from different videos as negative samples through contrastive learning. - **Data augmentation**: A temporal augmentation technique is introduced to expand the variations in heart rate and respiratory rate in the training dataset by upsampling or downsampling, thereby improving the model's robustness. - **Pre-trained model**: If a heart rate or respiratory rate estimation model has already been trained for a certain camera, it can be used as a pre-trained model. The pre-trained model is fixed, and only the model for the new application camera is trained, achieving domain adaptation. ### Experimental Results - **Performance superior to existing methods**: Experimental results show that CalibrationPhys outperforms existing heart rate and respiratory rate estimation methods on two datasets. - **Low computational cost**: Compared to traditional 3D convolutional neural networks, the 2D convolutional neural network used by CalibrationPhys has a lower computational cost, making it more suitable for running on resource-limited devices (such as smartphones). ### Summary CalibrationPhys addresses the dependency on supervised labels when measuring heart rate and respiratory rate across different cameras through self-supervised learning and contrastive learning. It improves the robustness and adaptability of the model, making it easier to use with any camera.

CalibrationPhys: Self-supervised Video-based Heart and Respiratory Rate Measurements by Calibrating Between Multiple Cameras

Remote Heart Rate Measurement from Face Videos under Realistic Situations

Video-based Remote Physiological Measurement via Self-supervised Learning

Deep-HR: Fast Heart Rate Estimation from Face Video under Realistic Conditions

VidBP: Detecting Blood Pressure from Facial Videos with Personalized Calibration

Facial Video-based Remote Physiological Measurement via Self-supervised Learning

A Supervised Learning Approach for Robust Health Monitoring using Face Videos

Training Robust Deep Physiological Measurement Models with Synthetic Video-based Data

ECG Signal Reconstruction Based on Facial Videos Via Combined Explicit and Implicit Supervision.

Self-Supervised Camera Self-Calibration from Video

3D Convolutional Neural Networks for Remote Pulse Rate Measurement and Mapping from Facial Video

Using High-Fidelity Avatars to Advance Camera-based Cardiac Pulse Measurement

Lightweight and interpretable convolutional neural network for real-time heart rate monitoring using low-cost video camera under realistic conditions

Revisiting Motion-Based Respiration Measurement from Videos

Estimation of vital signs from facial videos via video magnification and deep learning

DeepPhys: Video-Based Physiological Measurement Using Convolutional Attention Networks

Contrast-Phys+: Unsupervised and Weakly-supervised Video-based Remote Physiological Measurement via Spatiotemporal Contrast

Contrast-Phys: Unsupervised Video-based Remote Physiological Measurement via Spatiotemporal Contrast

Contactless face video based vital signs detection framework for continuous health monitoring using feature optimization and hybrid neural network

Remote Photoplethysmography from Low Resolution videos: An end-to-end solution using Efficient ConvNets

MobilePhys: Personalized Mobile Camera-Based Contactless Physiological Sensing