Abstract:Masked autoencoder (MAE) is a recently widely used self-supervised learning method that has achieved great success in NLP and computer vision. However, the potential advantages of masked pre-training for point cloud understanding have not been fully explored. There is preliminary work on MAE-based point clouds using the Transformer architecture to explore low-level geometric representations in 3D space, which is insufficient for fine-grained decoding completion and downstream tasks. Inspired by multimodality, we propose Inter-MAE, a inter-modal MAE method for self-supervised learning on point clouds. Specifically, we first use Point-MAE as a baseline to partition point clouds into random low percentage of visible and high percentage of masked point patches. Then, a standard Transformer-based autoencoder is built by asymmetric design and shifting mask operations, and latent features are learned from the visible point patches aiming to recover the masked point patches. In addition, we generate image features based on ViT after point cloud rendering to form inter-modal contrastive learning with the decoded features of the completed point patches. Extensive experiments show that the proposed Inter-MAE generates pre-trained models that are effective and exhibit superior results in various downstream tasks. For example, an accuracy of 85.4% is achieved on ScanObjectNN and 86.3% on ShapeNetPart, outperforming other state-of-the-art self-supervised learning methods. Notably, our work establishes for the first time the feasibility of applying image modality to masked point clouds. The code is publicly available at https://github.com/ywu0912/TeamCode.git

Point-MPP: Point Cloud Self-Supervised Learning from Masked Position Prediction

Masked Autoencoders for Point Cloud Self-supervised Learning.

M^3CS: Multi-Target Masked Point Modeling with Learnable Codebook and Siamese Decoders

Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos

Point Cloud Domain Adaptation Via Masked Local 3D Structure Prediction

Point Contrastive Prediction with Semantic Clustering for Self-Supervised Learning on Point Cloud Videos

Point‐AGM : Attention Guided Masked Auto‐Encoder for Joint Self‐supervised Learning on Point Clouds

PCP-MAE: Learning to Predict Centers for Point Masked Autoencoders

A Simple Masked Autoencoder Paradigm for Point Cloud

GeoMAE: Masked Geometric Target Prediction for Self-supervised Point Cloud Pre-Training

Masked Autoencoders in 3D Point Cloud Representation Learning

Inter-Modal Masked Autoencoder for Self-Supervised Learning on Point Clouds

Triple Point Masking

Masked Motion Prediction with Semantic Contrast for Point Cloud Sequence Learning

PointCG: Self-supervised Point Cloud Learning via Joint Completion and Generation

PointGame: Geometrically and Adaptively Masked Auto-Encoder on Point Clouds

PointGame: Geometrically and Adaptively Masked Autoencoder on Point Clouds

Pre-training Point Cloud Compact Model with Partial-aware Reconstruction

LCM: Locally Constrained Compact Point Cloud Model for Masked Point Modeling

GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds