Abstract:Learning from Label Proportion (LLP) is a weakly supervised learning scenario in which training data is organized into predefined bags of instances, disclosing only the class label proportions per bag. This paradigm is essential for user modeling and personalization, where user privacy is paramount, offering insights into user preferences without revealing individual data. LLP faces a unique difficulty: the misalignment between bag-level supervision and the objective of instance-level prediction, primarily due to the inherent ambiguity in label proportion matching. Previous studies have demonstrated deep representation learning can generate auxiliary signals to promote the supervision level in the image domain. However, applying these techniques to tabular data presents significant challenges: 1) they rely heavily on label-invariant augmentation to establish multi-view, which is not feasible with the heterogeneous nature of tabular datasets, and 2) tabular datasets often lack sufficient semantics for perfect class distinction, making them prone to suboptimality caused by the inherent ambiguity of label proportion matching. To address these challenges, we propose an augmentation-free contrastive framework TabLLP-BDC that introduces class-aware supervision (explicitly aware of class differences) at the instance level. Our solution features a two-stage Bag Difference Contrastive (BDC) learning mechanism that establishes robust class-aware instance-level supervision by disassembling the nuance between bag label proportions, without relying on augmentations. Concurrently, our model presents a pioneering multi-task pretraining pipeline tailored for tabular-based LLP, capturing intrinsic tabular feature correlations in alignment with label proportion distribution. Extensive experiments demonstrate that TabLLP-BDC achieves state-of-the-art performance for LLP in the tabular domain.

Class-aware and Augmentation-free Contrastive Learning from Label Proportion

Learning under Label Proportions for Text Classification

Two-stage Training for Learning from Label Proportions

Learning from Label Proportions with Consistency Regularization

Learning from Label Proportions by Learning with Label Noise

Forming Auxiliary High-confident Instance-level Loss to Promote Learning from Label Proportions

OT-LLP: Optimal Transport for Learning from Label Proportions

MixBag: Bag-Level Data Augmentation for Learning from Label Proportions

LLP-Bench: A Large Scale Tabular Benchmark for Learning from Label Proportions

Contrastive Label Enhancement

Theoretical Proportion Label Perturbation for Learning from Label Proportions in Large Bags

Learning from Label Proportions: Bootstrapping Supervised Learners via Belief Propagation

Label Anchored Contrastive Learning for Language Understanding

Learning from Label Proportions and Covariate-shifted Instances

Tabular Data Contrastive Learning via Class-Conditioned and Feature-Correlation Based Augmentation

Learning from Label Proportions with Instance-wise Consistency

Local Contrastive Feature learning for Tabular Data

Continuous Contrastive Learning for Long-Tailed Semi-Supervised Recognition

Transfer Learning-Based Label Proportions Method with Data of Uncertainty

Use All The Labels: A Hierarchical Multi-Label Contrastive Learning Framework

Bt-Vmf Contrastive and Collaborative Learning for Long-Tailed Visual Recognition