Abstract:Human pose and shape (HPS) estimation with lensless imaging is not only beneficial to privacy protection but also can be used in covert surveillance scenarios due to the small size and simple structure of this device. However, this task presents significant challenges due to the inherent ambiguity of the captured measurements and lacks effective methods for directly estimating human pose and shape from lensless data. In this paper, we propose the first end-to-end framework to recover 3D human poses and shapes from lensless measurements to our knowledge. We specifically design a multi-scale lensless feature decoder to decode the lensless measurements through the optically encoded mask for efficient feature extraction. We also propose a double-head auxiliary supervision mechanism to improve the estimation accuracy of human limb ends. Besides, we establish a lensless imaging system and verify the effectiveness of our method on various datasets acquired by our lensless imaging system.

What problem does this paper attempt to address?

The paper attempts to address the problem of human pose and shape estimation from lensless imaging systems. Specifically, the authors propose an end-to-end framework named LPSNet, which aims to recover 3D human pose and shape directly from lensless measurement data without first converting the lensless measurement data into RGB images. Solving this problem not only helps protect privacy but can also be used in covert surveillance scenarios due to the small size, simple structure, and low cost of lensless imaging systems. ### Main Challenges 1. **Feature Extraction**: How to effectively extract features from lensless measurement data for human pose and shape estimation. 2. **Limb Estimation Accuracy**: In early experiments, it was found that the accuracy of limb estimation was poor when using features extracted from lensless measurement data for human pose and shape estimation. ### Solutions 1. **Multi-Scale Lensless Feature Decoder (MSFDecoder)**: A multi-scale lensless feature decoder was designed to effectively decode the information generated by the lensless imaging system and extract global features. 2. **Dual-Head Auxiliary Supervision Mechanism (DHAS)**: A dual-head auxiliary supervision mechanism was proposed to improve the accuracy of human limb estimation by adding auxiliary supervision during training. ### Main Contributions 1. **LPSNet**: Proposed LPSNet, the first end-to-end network to directly estimate 3D human pose and shape from lensless measurement data. 2. **MSFDecoder**: Designed a multi-scale lensless feature decoder that can efficiently extract features from lensless measurement data. 3. **DHAS**: Proposed a dual-head auxiliary supervision mechanism that can significantly improve the accuracy of human limb estimation. ### Experimental Validation - **Baseline Method**: For comparison, the authors designed a two-stage baseline method that first uses a lensless image reconstruction method to recover images and then uses PyMAF for human pose and shape estimation. - **Quantitative Evaluation**: On the LenslessHuman3.6M dataset, LPSNet significantly outperformed the baseline method on multiple metrics, especially on MPJPE and PVE metrics. - **Qualitative Evaluation**: Visualized results show that LPSNet performs better than the baseline method in complex scenarios, especially in limb estimation. ### Conclusion The paper successfully proposes and validates an end-to-end framework, LPSNet, for directly estimating 3D human pose and shape from lensless measurement data. Experimental results show that LPSNet can achieve high-precision human pose and shape estimation while protecting privacy and reducing costs.

LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging

Exploring Severe Occlusion: Multi-Person 3D Pose Estimation with Gated Convolution.

LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment

High-precision Human Body Acquisition Via Multi-View Binocular Stereopsis

LenslessFace: An End-to-End Optimized Lensless System for Privacy-Preserving Face Verification

Live Stream Temporally Embedded 3D Human Body Pose and Shape Estimation

LiCamPose: Combining Multi-View LiDAR and RGB Cameras for Robust Single-frame 3D Human Pose Estimation

3D Human Pose Estimation Based on Wearable IMUs and Multiple Camera Views

Reconstructing 3D human pose and shape from a single image and sparse IMUs

LASOR: Learning Accurate 3D Human Pose and Shape Via Synthetic Occlusion-Aware Data and Neural Mesh Rendering

PointHPS: Cascaded 3D Human Pose and Shape Estimation from Point Clouds

3D Human Pose Estimation with Single Image and Inertial Measurement Unit (IMU) Sequence

RSB-Pose: Robust Short-Baseline Binocular 3D Human Pose Estimation with Occlusion Handling

LidPose: Real-Time 3D Human Pose Estimation in Sparse Lidar Point Clouds with Non-Repetitive Circular Scanning Pattern

EPOCH: Jointly Estimating the 3D Pose of Cameras and Humans

DirectPose: Direct End-to-End Multi-Person Pose Estimation

Simultaneously-Collected Multimodal Lying Pose Dataset: Towards In-Bed Human Pose Monitoring under Adverse Vision Conditions

Hand Gestures Recognition in Videos Taken with Lensless Camera

LAMP: Leveraging Language Prompts for Multi-person Pose Estimation

Weakly Supervised 3D Multi-Person Pose Estimation for Large-Scale Scenes Based on Monocular Camera and Single LiDAR

Hand gestures recognition in videos taken with a lensless camera