TranPhys: Spatiotemporal Masked Transformer Steered Remote Photoplethysmography Estimation

Hang Shao,Lei Luo,Jianjun Qian,Shuo Chen,Chuanfei Hu,Jian Yang
DOI: https://doi.org/10.1109/tcsvt.2023.3307700
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Subtle variations are invisible to the naked eyes in human physiological signals can reflect important biological and health indicators. Although numerous computer vision methods have been proposed to recover and magnify these changes, most of them either only focus on identifying and recognizing explicit features such as shapes and textures, or are weak in long-term temporal modeling and spatiotemporal interactive perception of implicit biometrics. Therefore, it is difficult for them to robustly overcome various disturbances that affect detection performance. To address these issues, this paper presents TranPhys, a novel remote photoplethysmography (rPPG) network for facial video-based heart rate estimation. Specifically, first, we argue that facial subregions vary over time due to their biological personalities. So we split the input face video into multiple spatiotemporal tubes, build the 3D vision transformer with encoders and decoders to adequately model the high-dimensional representations of the respective regulars in each subregion, and globally coordinate their feedback on the cardiac pulsing waveform. Second, we design the temporal pooling attention to more finely mine the subtle changes hidden in the skin color over time and their long-term contextual rhythm cues. Third, we leverage the self-supervised masked autoencoding paradigm to overcome redundancy to enhance the robustness of our model, and construct the targeted spatiotemporal sampling maps instead of raw input sequences as the pretrained constraint labels to fully inspire self-supervision. We train, validate, and practice our TranPhys on multiple public datasets to demonstrate that our method achieves the competitive performance in remote heart rate estimation.
What problem does this paper attempt to address?