RoNID: New Intent Discovery with Generated-Reliable Labels and Cluster-friendly Representations

Shun Zhang,Chaoran Yan,Jian Yang,Changyu Ren,Jiaqi Bai,Tongliang Li,Zhoujun Li
2024-04-18
Abstract:New Intent Discovery (NID) strives to identify known and reasonably deduce novel intent groups in the open-world scenario. But current methods face issues with inaccurate pseudo-labels and poor representation learning, creating a negative feedback loop that degrades overall model performance, including accuracy and the adjusted rand index. To address the aforementioned challenges, we propose a Robust New Intent Discovery (RoNID) framework optimized by an EM-style method, which focuses on constructing reliable pseudo-labels and obtaining cluster-friendly discriminative representations. RoNID comprises two main modules: reliable pseudo-label generation module and cluster-friendly representation learning module. Specifically, the pseudo-label generation module assigns reliable synthetic labels by solving an optimal transport problem in the E-step, which effectively provides high-quality supervised signals for the input of the cluster-friendly representation learning module. To learn cluster-friendly representation with strong intra-cluster compactness and large inter-cluster separation, the representation learning module combines intra-cluster and inter-cluster contrastive learning in the M-step to feed more discriminative features into the generation module. RoNID can be performed iteratively to ultimately yield a robust model with reliable pseudo-labels and cluster-friendly representations. Experimental results on multiple benchmarks demonstrate our method brings substantial improvements over previous state-of-the-art methods by a large margin of +1~+4 points.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The paper addresses the problem of novel intent discovery (NID) in open-world scenarios, which aims to accurately identify both known and inferred novel intent groups. Current methods suffer from inaccurate pseudo labels and poor representation learning, resulting in degraded performance. To tackle this, the paper proposes a framework called RoNID, which optimizes reliable pseudo label generation and cluster-friendly discriminative representation learning using an EM-style approach. RoNID consists of two modules: reliable pseudo label generation and cluster-friendly representation learning. Pseudo labels are generated by solving the optimal transport problem, and representation learning is performed by combining cohesiveness and separability contrastive learning to improve the accuracy and clustering effect of the model. Experimental results demonstrate that RoNID significantly outperforms previous methods on multiple benchmarks.