Abstract:Nowadays, driven by the increasing concern on 3D techniques, resulting in the large-scale 3D data, 3D model classification has attracted enormous attention from both research and industry communities. Most of the current methods highly depend on sufficient labeled 3D models, which substantially restricts their scalability to novel classes with few annotated training data since it can increase the chance of overfitting. Besides, they only leverage single-modal information (either point cloud or multi-view information), and few works integrate these complementary information for 3D model representation. To overcome these problems, we propose a multi-modal meta-transfer fusion network (M TF), the key of which is to perform few-shot multi-modal representation for 3D model classification. Specifically, we first convert the original 3D data into both multi-view and point cloud modalities, and pre-train individual encoding networks on a large-scale dataset to obtain the optimal initial parameters, which is beneficial to few-shot learning tasks. Then, to enable the network to adjust to few-shot learning tasks, we update the parameters in Scaling and Shifting operation ( SS ), multi-modal representation fusion (MMRF) and the 3D model classifier to obtain optimal initialization parameters. Since the large-scale training parameters in feature extractors will increase the chance of overfitting, we freeze the feature extractor and introduce a SS operation to adjust its weights. Specifically, SS can reduce the number of training parameters up to 20% , which can effectively avoid overfitting. MMRF can adaptively integrate the multi-modal information based on their significance to the 3D model for a more robust 3D representation. Since there is no available dataset for evaluation, we build three 3D CAD datasets, Meta-ModalNet, Meta-ShapeNet and Meta-RGBD, for this new task and implement the representative methods for fair comparisons. Extensive experimental results can demonstrate the superiority of the proposed method.

MultiMedia Modeling: 26th International Conference, MMM 2020, Daejeon, South Korea, January 5–8, 2020, Proceedings, Part II

Multi-Scale Comparison Network For Few-Shot Learning

Dual-stream Multi-Modal Graph Neural Network for Few-Shot Learning

Multi-Similarity Enhancement Network for Few-Shot Segmentation.

Learning to focus: cascaded feature matching network for few-shot image recognition

Multi-Content Interaction Network for Few-Shot Segmentation

Multi-distance Metric Network for Few-Shot Learning

Memory-Augmented Relation Network for Few-Shot Learning

Multi-Scale Adaptive Task Attention Network for Few-Shot Learning.

Multi-scale Matching Networks for Semantic Correspondence

Enhancing Few-Shot Image Classification through Learnable Multi-Scale Embedding and Attention Mechanisms

Multi-Modal Meta-Transfer Fusion Network for Few-Shot 3D Model Classification

Multi-instance attention network for few-shot learning

Multi-scale Unified Network for Image Classification

Multi-local feature relation network for few-shot learning

Layer-Wise Mutual Information Meta-Learning Network for Few-Shot Segmentation

Enhancing Few-Shot Classification without Forgetting through Multi-Level Contrastive Constraints

Multi-level Metric Learning for Few-Shot Image Recognition

A Multi-Layer Feature Fusion Method for Few-Shot Image Classification

Multi-scale Self-similarity Network for Few-Shot Segmentation

TMNIO:Triplet merged network with involution operators for improved few‐shot image classification