Abstract:Amidst the critical role that high-quality labeled data plays in advancing machine learning, the persistence of noise within widely-used datasets remains a challenge. While noise learning has gained traction within machine learning, particularly in computer vision, its exploration in text and multimodal classification domains has lagged. Furthermore, a comprehensive comparison of noise learning techniques in text and multimodal classification has been lacking, partly due to variations in experimental noise settings across prior studies. Addressing these gaps, this research introduces a pioneering Multimodal Infusion Joint Training (MinJoT) framework featuring a novel co-regularized loss function that seamlessly integrates multimodal information during joint training. This framework notably excels in maintaining model robustness amidst noisy data environments. Adapting established noise learning methods from computer vision to text classification, the study conducts extensive experiments across five English and Chinese textual and multimodal datasets, involving four methods, five noise modes, and seven noise rates. Critically, this work challenges the implicit assumption that widely-used datasets are devoid of noise, revealing that these datasets indeed encompass noise levels ranging from 0.61% to 15.77% which is defined as intrinsic noise in this study. For the first time, the study investigates the impact of intrinsic noise on model performance, categorizing it into distinct levels of ambiguity. To facilitate accurate method comparison, a new dataset, Golden-Chnsenticorp (G-Chnsenticorp), is introduced, carefully crafted to be free of intrinsic noise. This research establishes the inaugural noise learning benchmark for text classification, while simultaneously pioneering the first noise learning framework tailored for multimodal sentiment classification. Through these contributions, the study advances the understanding of noise learning in text and multimodal contexts, providing a novel framework, benchmarks, and insights to propel the field forward.

The Power of Noise: Toward a Unified Multi-modal Knowledge Graph Representation Framework.

Noise-powered Multi-modal Knowledge Graph Representation Framework

MACO: A Modality Adversarial and Contrastive Framework for Modality-missing Multi-modal Knowledge Graph Completion

NativE: Multi-modal Knowledge Graph Completion in the Wild

Multi-modal knowledge graphs representation learning via multi-headed self-attention

Unleashing the Power of Imbalanced Modality Information for Multi-modal Knowledge Graph Completion

MoSE: Modality Split and Ensemble for Multimodal Knowledge Graph Completion

NeuralKG: an Open Source Library for Diverse Representation Learning of Knowledge Graphs

Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion

Multi-hop neighbor fusion enhanced hierarchical transformer for multi-modal knowledge graph completion

Knowledge Graph Completion with Pre-trained Multimodal Transformer and Twins Negative Sampling

Multi-Grained Query-Guided Set Prediction Network for Grounded Multimodal Named Entity Recognition

MERGE: A Modal Equilibrium Relational Graph Framework for Multi-Modal Knowledge Graph Completion

Contrastive Multi-modal Knowledge Graph Representation Learning

MEGA: Meta-Graph Augmented Pre-Training Model for Knowledge Graph Completion

MinJoT: Multimodal Infusion Joint Training for Noise Learning in Text and Multimodal Classification Problems

Towards Better Multi-modal Keyphrase Generation via Visual Entity Enhancement and Multi-granularity Image Noise Filtering

Towards Robust Knowledge Graph Embedding via Multi-task Reinforcement Learning

Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning

MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound