Abstract:In this paper we study multi-label learning with weakly labeled data, i.e., labels of training examples are incomplete, which commonly occurs in real applications, e.g., image classification, document categorization. This setting includes, e.g., (i) semi-supervised multi-label learning where completely labeled examples are partially known; (ii) weak label learning where relevant labels of examples are partially known; (iii) extended weak label learning where relevant and irrelevant labels of examples are partially known. Previous studies often expect that the learning method with the use of weakly labeled data will improve the performance, as more data are employed. This, however, is not always the cases in reality, i.e., weakly labeled data may sometimes degenerate the learning performance. It is desirable to learn safe multi-label prediction that will not hurt performance when weakly labeled data is involved in the learning procedure. In this work we optimize multi-label evaluation metrics (\(\hbox {F}_1\) score and Top-k precision) given that the ground-truth label assignment is realized by a convex combination of base multi-label learners. To cope with the infinite number of possible ground-truth label assignments, cutting-plane strategy is adopted to iteratively generate the most helpful label assignments. The whole optimization is cast as a series of simple linear programs in an efficient manner. Extensive experiments on three weakly labeled learning tasks, namely, (i) semi-supervised multi-label learning; (ii) weak label learning and (iii) extended weak label learning, clearly show that our proposal improves the safeness of using weakly labeled data compared with many state-of-the-art methods.

Weak-PMLC: A large-scale framework for multi-label policy classification based on extremely weak supervision

Open-world Multi-label Text Classification with Extremely Weak Supervision

Partial Multi-label Learning with Label and Feature Collaboration

Large Loss Matters in Weakly Supervised Multi-Label Classification

Learning Safe Multi-Label Prediction for Weakly Labeled Data

PolicyGPT: Automated Analysis of Privacy Policies with Large Language Models

Entailment-Driven Privacy Policy Classification with LLMs

Multi-Evidence Filtering and Fusion for Multi-Label Classification, Object Detection and Semantic Segmentation Based on Weakly Supervised Learning

A Survey on Programmatic Weak Supervision

Positive Label Is All You Need for Multi-Label Classification

Combining Supervised Learning and Reinforcement Learning for Multi-Label Classification Tasks with Partial Labels

Policy Learning Using Weak Supervision.

Binary Classification with Positive Labeling Sources

Weak Learning Algorithm For Multi-Label Multiclass Text Categorization

Multiple weak supervision for short text classification

Mitigating Boundary Ambiguity and Inherent Bias for Text Classification in the Era of Large Language Models

Learning From Semi-Supervised Weak-Label Data

Weak Labeled Multi-Label Active Learning for Image Classification

PolyLM: An Open Source Polyglot Large Language Model

Automatic Image Annotation with Weakly Labeled Dataset

AutoWS: Automated Weak Supervision Framework for Text Classification