Abstract:Negative sampling is essential for implicit-feedback-based collaborative filtering, which is used to constitute negative signals from massive unlabeled data to guide supervised learning. The state-of-the-art idea is to utilize hard negative samples that carry more useful information to form a better decision boundary. To balance efficiency and effectiveness, the vast majority of existing methods follow the two-pass approach, in which the first pass samples a fixed number of unobserved items by a simple static distribution and then the second pass selects the final negative items using a more sophisticated negative sampling strategy. However, selecting negative samples from the original items is inherently restricted, and thus may not be able to contrast positive samples well. In this paper, we confirm this observation via experiments and introduce two limitations of existing solutions: ambiguous trap and information discrimination. Our response to such limitations is to introduce augmented negative samples. This direction renders a substantial technical challenge because constructing unconstrained negative samples may introduce excessive noise that distorts the decision boundary. To this end, we introduce a novel generic augmented negative sampling paradigm and provide a concrete instantiation. First, we disentangle hard and easy factors of negative items. Next, we generate new candidate negative samples by augmenting only the easy factors in a regulated manner: the direction and magnitude of the augmentation are carefully calibrated. Finally, we design an advanced negative sampling strategy to identify the final augmented negative samples, which considers not only the score function used in existing methods but also a new metric called augmentation gain. Extensive experiments on real-world datasets demonstrate that our method significantly outperforms state-of-the-art baselines.

Sampling Matters! An Empirical Study of Negative Sampling Strategies for Learning of Matching Models in Retrieval-based Dialogue Systems

Strategy of the Negative Sampling for Training Retrieval-Based Dialogue Systems

Towards Automated Negative Sampling in Implicit Recommendation

Revisiting Negative Sampling Vs. Non-sampling in Implicit Recommendation

Batch-Mix Negative Sampling for Learning Recommendation Retrievers

Does Negative Sampling Matter? A Review with Insights into its Theory and Applications

SimANS: Simple Ambiguous Negatives Sampling for Dense Text Retrieval

Dynamic negative sampling for recommendation with feature matching

Simplify and Robustify Negative Sampling for Implicit Collaborative Filtering

TriSampler: A Better Negative Sampling Principle for Dense Retrieval

Identifying Untrustworthy Samples: Data Filtering for Open-domain Dialogues with Bayesian Optimization

Negative Sampling in Recommendation: A Survey and Future Directions

Rethinking Samples Selection for Contrastive Learning: Mining of Potential Samples

Challenging Instances are Worth Learning: Generating Valuable Negative Samples for Response Selection Training

Enhancing Recommender Systems: A Strategy to Mitigate False Negative Impact

Understanding Negative Sampling in Graph Representation Learning

Re-weighting Negative Samples for Model-Agnostic Matching

Optimizing Dense Retrieval Model Training with Hard Negatives.

Augmented Negative Sampling for Collaborative Filtering

Learning a Matching Model with Co-teaching for Multi-turn Response Selection in Retrieval-based Dialogue Systems

NS4AR: A new, focused on sampling areas sampling method in graphical recommendation Systems