MSDN: A Multistage Deep Network for Heart-Rate Estimation from Facial Videos
Xiaobiao Zhang,Zhaoqiang Xia,Jing Dai,Lili Liu,Jinye Peng,Xiaoyi Feng
DOI: https://doi.org/10.1109/tim.2023.3329095
IF: 5.6
2023-01-01
IEEE Transactions on Instrumentation and Measurement
Abstract:Noncontact heart-rate (HR) measurement is a very important trend in clinical medicine. Recently, a variety of deep networks have been applied to estimate HRs from facial videos. However, due to limited data resources and poor parameter optimization, few existing models have achieved incredible performance in complicated scenarios, such as those with illumination changes, different skin tones, and facial motion. To address these challenges, this article proposes a novel multistage deep network (MSDN) that can decentralize the learnable parameters into different stages to reduce the difficulty of learning through multiple training steps. Specifically, the proposed network consists of three stages in an end-to-end way. In the first stage, an HR-aware feature extractor uses the next convolutional neural network (ConvNeXt) embedded with a newly designed bandpass filter as its backbone to extract spatial-temporal features for determining HR changes. Moreover, pseudolabels are generated to make the features compatible with illumination, motion, and color variance. In the second stage, various modules, including singular value decomposition (SVD) pooling and enhanced difference convolution (EDC) modules, are then designed and combined with a transformer encoder to construct a feature-compressed remote photoplethysmography (rPPG) generator. In the third stage, an HR estimator with an interbeat interval (IBI) analyzer and a 1-D filter is newly designed for HR estimation. Extensive experiments are performed on three publicly available databases (i.e., VIPL-HR, COHFACE, and PURE), and the results demonstrate the effectiveness of the proposed method through ablation studies and comparison experiments with state-of-the-art (SOTA) methods.