LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging

Haoyang Ge,Qiao Feng,Hailong Jia,Xiongzheng Li,Xiangjun Yin,You Zhou,Jingyu Yang,Kun Li
2024-04-08
Abstract:Human pose and shape (HPS) estimation with lensless imaging is not only beneficial to privacy protection but also can be used in covert surveillance scenarios due to the small size and simple structure of this device. However, this task presents significant challenges due to the inherent ambiguity of the captured measurements and lacks effective methods for directly estimating human pose and shape from lensless data. In this paper, we propose the first end-to-end framework to recover 3D human poses and shapes from lensless measurements to our knowledge. We specifically design a multi-scale lensless feature decoder to decode the lensless measurements through the optically encoded mask for efficient feature extraction. We also propose a double-head auxiliary supervision mechanism to improve the estimation accuracy of human limb ends. Besides, we establish a lensless imaging system and verify the effectiveness of our method on various datasets acquired by our lensless imaging system.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the problem of human pose and shape estimation from lensless imaging systems. Specifically, the authors propose an end-to-end framework named LPSNet, which aims to recover 3D human pose and shape directly from lensless measurement data without first converting the lensless measurement data into RGB images. Solving this problem not only helps protect privacy but can also be used in covert surveillance scenarios due to the small size, simple structure, and low cost of lensless imaging systems. ### Main Challenges 1. **Feature Extraction**: How to effectively extract features from lensless measurement data for human pose and shape estimation. 2. **Limb Estimation Accuracy**: In early experiments, it was found that the accuracy of limb estimation was poor when using features extracted from lensless measurement data for human pose and shape estimation. ### Solutions 1. **Multi-Scale Lensless Feature Decoder (MSFDecoder)**: A multi-scale lensless feature decoder was designed to effectively decode the information generated by the lensless imaging system and extract global features. 2. **Dual-Head Auxiliary Supervision Mechanism (DHAS)**: A dual-head auxiliary supervision mechanism was proposed to improve the accuracy of human limb estimation by adding auxiliary supervision during training. ### Main Contributions 1. **LPSNet**: Proposed LPSNet, the first end-to-end network to directly estimate 3D human pose and shape from lensless measurement data. 2. **MSFDecoder**: Designed a multi-scale lensless feature decoder that can efficiently extract features from lensless measurement data. 3. **DHAS**: Proposed a dual-head auxiliary supervision mechanism that can significantly improve the accuracy of human limb estimation. ### Experimental Validation - **Baseline Method**: For comparison, the authors designed a two-stage baseline method that first uses a lensless image reconstruction method to recover images and then uses PyMAF for human pose and shape estimation. - **Quantitative Evaluation**: On the LenslessHuman3.6M dataset, LPSNet significantly outperformed the baseline method on multiple metrics, especially on MPJPE and PVE metrics. - **Qualitative Evaluation**: Visualized results show that LPSNet performs better than the baseline method in complex scenarios, especially in limb estimation. ### Conclusion The paper successfully proposes and validates an end-to-end framework, LPSNet, for directly estimating 3D human pose and shape from lensless measurement data. Experimental results show that LPSNet can achieve high-precision human pose and shape estimation while protecting privacy and reducing costs.