Remote Heart Rate Measurement from Highly Compressed Facial Videos: an End-to-end Deep Learning Solution with Video Enhancement

Zitong Yu,Wei Peng,Xiaobai Li,Xiaopeng Hong,Guoying Zhao
DOI: https://doi.org/10.48550/arXiv.1907.11921
2019-07-27
Abstract:Remote photoplethysmography (rPPG), which aims at measuring heart activities without any contact, has great potential in many applications (e.g., remote healthcare). Existing rPPG approaches rely on analyzing very fine details of facial videos, which are prone to be affected by video compression. Here we propose a two-stage, end-to-end method using hidden rPPG information enhancement and attention networks, which is the first attempt to counter video compression loss and recover rPPG signals from highly compressed videos. The method includes two parts: 1) a Spatio-Temporal Video Enhancement Network (STVEN) for video enhancement, and 2) an rPPG network (rPPGNet) for rPPG signal recovery. The rPPGNet can work on its own for robust rPPG measurement, and the STVEN network can be added and jointly trained to further boost the performance especially on highly compressed videos. Comprehensive experiments are performed on two benchmark datasets to show that, 1) the proposed method not only achieves superior performance on compressed videos with high-quality videos pair, 2) it also generalizes well on novel data with only compressed videos available, which implies the promising potential for real world applications.
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is remote heart rate measurement (rPPG) in highly compressed facial videos. Specifically, most existing rPPG methods rely on very subtle details when analyzing facial videos, and these details are easily affected by video compression. Video compression can lead to the loss of information in the rPPG signal, thus affecting the accuracy of heart rate measurement. However, since video compression is very important for storage and transmission in remote services (such as telemedicine), developing rPPG methods that can work robustly on highly compressed videos has great practical value. To meet this challenge, the author proposes a two - stage end - to - end method, which uses hidden rPPG information enhancement and an attention network to combat video compression losses and recover the rPPG signal from highly compressed facial videos. This method consists of two parts: 1. **Spatio - Temporal Video Enhancement Network (STVEN)**: It is used for video enhancement. By fine - grained learning, it is assumed that the compression artifacts at different compression bit rates have different distributions, thereby improving the quality of the compressed video to a level close to that of the original video. 2. **rPPG Network (rPPGNet)**: It is used for rPPG signal recovery. It can work independently to achieve robust rPPG measurement, and can also be jointly trained with STVEN to further improve performance, especially on highly compressed videos. Through comprehensive experiments on two benchmark datasets, the author shows that this method not only performs well on high - quality videos versus compressed videos, but also has good generalization ability on new data where only compressed videos are available, indicating its great potential in practical applications.