DD-rPPGNet: De-interfering and Descriptive Feature Learning for Unsupervised rPPG Estimation

Pei-Kai Huang,Tzu-Hsien Chen,Ya-Ting Chan,Kuan-Wen Chen,Chiou-Ting Hsu
2024-07-31
Abstract:Remote Photoplethysmography (rPPG) aims to measure physiological signals and Heart Rate (HR) from facial videos. Recent unsupervised rPPG estimation methods have shown promising potential in estimating rPPG signals from facial regions without relying on ground truth rPPG signals. However, these methods seem oblivious to interference existing in rPPG signals and still result in unsatisfactory performance. In this paper, we propose a novel De-interfered and Descriptive rPPG Estimation Network (DD-rPPGNet) to eliminate the interference within rPPG features for learning genuine rPPG signals. First, we investigate the characteristics of local spatial-temporal similarities of interference and design a novel unsupervised model to estimate the interference. Next, we propose an unsupervised de-interfered method to learn genuine rPPG signals with two stages. In the first stage, we estimate the initial rPPG signals by contrastive learning from both the training data and their augmented counterparts. In the second stage, we use the estimated interference features to derive de-interfered rPPG features and encourage the rPPG signals to be distinct from the interference. In addition, we propose an effective descriptive rPPG feature learning by developing a strong 3D Learnable Descriptive Convolution (3DLDC) to capture the subtle chrominance changes for enhancing rPPG estimation. Extensive experiments conducted on five rPPG benchmark datasets demonstrate that the proposed DD-rPPGNet outperforms previous unsupervised rPPG estimation methods and achieves competitive performances with state-of-the-art supervised rPPG methods.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily addresses the issue of interference in remote photoplethysmography (rPPG) signal estimation and proposes a novel unsupervised method aimed at accurately extracting physiological signals from facial videos without relying on real rPPG signals as training data. Specifically, the paper points out that current unsupervised rPPG estimation methods, while capable of estimating rPPG signals from facial regions to some extent, often overlook the interference present in rPPG signals and perform poorly on datasets with challenging interference. To improve this situation, the research team designed a new model called "De-interference and Descriptive rPPG Estimation Network" (DD-rPPGNet). DD-rPPGNet consists of two main parts: 1. **Interference Estimation Branch**: Utilizes the characteristics of local spatiotemporal similarity to model and estimate interference signals. This part estimates interference features by analyzing signals from non-facial background regions. 2. **De-interference rPPG Estimation Branch**: First, it estimates preliminary rPPG signals from the original video and its enhanced versions through contrastive learning; then, it removes the interference components from the preliminary estimates using the interference features obtained in the first step to obtain pure rPPG signals. Additionally, the paper proposes a powerful 3D Learnable Descriptive Convolution (3DLDC) to capture subtle chromatic changes on the skin, thereby enhancing the ability to estimate rPPG signals. Experimental results show that DD-rPPGNet not only outperforms existing unsupervised rPPG estimation methods on multiple public rPPG benchmark datasets but also achieves performance comparable to state-of-the-art supervised rPPG estimation methods. In summary, the paper aims to address the issues faced by current unsupervised rPPG estimation methods when processing facial videos with challenging interference by proposing a new framework to improve the accuracy and robustness of rPPG signal estimation.