Abstract:Nowadays, driven by the increasing concern on 3D techniques, resulting in the large-scale 3D data, 3D model classification has attracted enormous attention from both research and industry communities. Most of the current methods highly depend on sufficient labeled 3D models, which substantially restricts their scalability to novel classes with few annotated training data since it can increase the chance of overfitting. Besides, they only leverage single-modal information (either point cloud or multi-view information), and few works integrate these complementary information for 3D model representation. To overcome these problems, we propose a multi-modal meta-transfer fusion network (M TF), the key of which is to perform few-shot multi-modal representation for 3D model classification. Specifically, we first convert the original 3D data into both multi-view and point cloud modalities, and pre-train individual encoding networks on a large-scale dataset to obtain the optimal initial parameters, which is beneficial to few-shot learning tasks. Then, to enable the network to adjust to few-shot learning tasks, we update the parameters in Scaling and Shifting operation ( SS ), multi-modal representation fusion (MMRF) and the 3D model classifier to obtain optimal initialization parameters. Since the large-scale training parameters in feature extractors will increase the chance of overfitting, we freeze the feature extractor and introduce a SS operation to adjust its weights. Specifically, SS can reduce the number of training parameters up to 20% , which can effectively avoid overfitting. MMRF can adaptively integrate the multi-modal information based on their significance to the 3D model for a more robust 3D representation. Since there is no available dataset for evaluation, we build three 3D CAD datasets, Meta-ModalNet, Meta-ShapeNet and Meta-RGBD, for this new task and implement the representative methods for fair comparisons. Extensive experimental results can demonstrate the superiority of the proposed method.

Multimodal variational contrastive learning for few-shot classification

Multimodal few-shot classification without attribute embedding

Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models

Adaptive Cross-Modal Few-Shot Learning

Supervised Momentum Contrastive Learning for Few-Shot Classification

Enhancing Few-Shot Classification without Forgetting through Multi-Level Contrastive Constraints

Bimodal semantic fusion prototypical network for few-shot classification

Boosting Few-Shot Classification with View-Learnable Contrastive Learning

Dual-stream Multi-Modal Graph Neural Network for Few-Shot Learning

Multimodal Prototypical Networks for Few-shot Learning

Semantic-Based Few-Shot Learning by Interactive Psychometric Testing

Multimodal CLIP Inference for Meta-Few-Shot Image Classification

Familial aggregation of streptomycin ototoxicity: autosomal dominant inheritance?

Binocular Mutual Learning for Improving Few-shot Classification

FILM: How can Few-Shot Image Classification Benefit from Pre-Trained Language Models?

Contrastive prototype network with prototype augmentation for few-shot classification

MMI-ML: Maximize Mutual Information Between Different Views for Few-Shot Remote Sensing Image Classification

VL-Few: Vision Language Alignment for Multimodal Few-Shot Meta Learning

Multi-Modal Meta-Transfer Fusion Network for Few-Shot 3D Model Classification

A Multi-Mode Modulator for Multi-Domain Few-Shot Classification

Variational Neuron Shifting for Few-Shot Image Classification Across Domains