AUGlasses: Continuous Action Unit based Facial Reconstruction with Low-power IMUs on Smart Glasses

Yanrong Li,Tengxiang Zhang,Xin Zeng,Yuntao Wang,Haotian Zhang,Yiqiang Chen
2024-05-22
Abstract:Recent advancements in augmented reality (AR) have enabled the use of various sensors on smart glasses for applications like facial reconstruction, which is vital to improve AR experiences for virtual social activities. However, the size and power constraints of smart glasses demand a miniature and low-power sensing solution. AUGlasses achieves unobtrusive low-power facial reconstruction by placing inertial measurement units (IMU) against the temporal area on the face to capture the skin deformations, which are caused by facial muscle movements. These IMU signals, along with historical data on facial action units (AUs), are processed by a transformer-based deep learning model to estimate AU intensities in real-time, which are then used for facial reconstruction. Our results show that AUGlasses accurately predicts the strength (0-5 scale) of 14 key AUs with a cross-user mean absolute error (MAE) of 0.187 (STD = 0.025) and achieves facial reconstruction with a cross-user MAE of 1.93 mm (STD = 0.353). We also integrated various preprocessing and training techniques to ensure robust performance for continuous sensing. Micro-benchmark tests indicate that our system consistently performs accurate continuous facial reconstruction with a fine-tuned cross-user model, achieving an AU MAE of 0.35.
Human-Computer Interaction,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the problems of size and power consumption limitations faced by smart glasses during facial reconstruction. Specifically, existing facial reconstruction methods usually rely on sensors such as cameras or millimeter - wave radars. Although these devices can comprehensively capture facial features, they require a third - party perspective, which is difficult to maintain in some environments. In addition, these sensors have high power consumption and are not suitable for long - term continuous use. **AUGlasses** achieves non - interfering low - power facial reconstruction by integrating low - power inertial measurement units (IMUs) on smart glasses. These IMUs are placed in the temporal region to capture skin deformations caused by facial muscle movements. By combining historical data and deep - learning models, AUGlasses can estimate the intensity of facial action units (AUs) in real - time and perform facial reconstruction accordingly. ### Main contributions 1. **Low - power and lightweight facial reconstruction method**: By using two IMUs on smart glasses to sense skin movements in the temporal region, a low - power and lightweight design is achieved. 2. **Signal processing pipeline design**: A series of signal pre - processing and model training techniques have been developed to ensure the accuracy of long - term continuous sensing. 3. **Comprehensive prototype evaluation**: The system has been comprehensively evaluated in different test scenarios, demonstrating the feasibility and practicality of the IMU - based facial reconstruction method. ### Technical details - **Sensor position optimization**: Through experiments, the optimal placement position of the IMU has been determined. It has been found that the zygomatic bone above the temporal region is the area where facial muscle changes are most significantly captured. - **Signal processing**: A motion artifact removal mechanism has been designed to filter out IMU signal changes caused by head movements. - **Model training**: A prefix - conditional sequence prediction strategy has been proposed, enabling the model to learn long - range dependencies, avoid exposure bias, and improve the accuracy and reliability of continuous facial reconstruction. ### Experimental results - **AU intensity prediction**: AUGlasses can accurately predict the intensity (0 - 5 levels) of 14 key AUs, with a mean absolute error (MAE) of 0.187 (standard deviation of 0.025) across users. - **Facial reconstruction**: Based on the predicted AU intensity, the system achieves accurate facial reconstruction, with an MAE of 1.93 mm (standard deviation of 0.353) across users. - **Long - term continuous prediction**: Within 30 seconds, the MAE only deteriorates to 0.4 and 0.5, verifying the system's ability for continuous facial reconstruction. ### Summary AUGlasses successfully solves the problems of size and power consumption limitations in facial reconstruction of smart glasses through innovative low - power IMU design and advanced signal processing and model training techniques, providing new solutions for applications such as augmented reality (AR).