Abstract:Clustering analysis is one of the main concerns in data mining. A common approach to the clustering process is to bring together points that are close to each other and separate points that are away from each other. Therefore, measuring the distance between sample points is crucial to the effectiveness of clustering. Filtering features by label information and mea-suring the distance between samples by these features is a common supervised learning method to reconstruct distance metric. However, in many application scenarios, it is very expensive to obtain a large number of labeled samples. In this paper, to solve the clustering problem in the few supervised sample and high data dimensionality scenarios, a novel semi-supervised clustering algorithm is proposed by designing an improved prototype network that attempts to reconstruct the distance metric in the sample space with a small amount of pairwise supervised information, such as Must-Link and Cannot -Link, and then cluster the data in the new metric space. The core idea is to make the similar ones closer and the dissimilar ones further away through embedding mapping. Extensive experiments on both real-world and synthetic datasets show the effectiveness of this algorithm. Average clustering metrics on various datasets improved by 8% compared to the comparison algorithm.

A Semi-Supervised Feature Clustering Algorithm with Application to Word Sense Disambiguation

A Novel Word Sense Disambiguation Algorithm Based on Semi-Supervised Statistical Learning

Semi-Supervised Fuzzy Clustering with Feature Discrimination

Improving semi-supervised text classification by using wikipedia knowledge

Semi-supervised clustering based on spectral clustering

Word Sense Disambiguation by Semi-supervised Learning

Learning model order from labeled and unlabeled data for partially supervised classification, with application to word sense disambiguation

Learning Word Sense with Feature Selection and Order Identification Capabilities.

Word sense disambiguation using label propagation based semi-supervised learning

Exploiting Word Cluster Information for Unsupervised Feature Selection

Word Sense Disambiguation Method with Topic Feature

A Semi-Supervised Text Clustering Algorithm with Word Distribution Weights

Semi-supervised Learning for Word Sense Disambiguation Using Parallel Corpora

Word Clustering for Collocation-Based Word Sense Disambiguation

Word Sense Learning Based on Feature Selection and MDL Principle

Semi-Supervised Clustering Algorithm Based on Deep Feature Mapping

Semi-Supervised Semantic Dynamic Text Clustering Algorithm

Research on dual pattern of unsupervised and supervised word sense disambiguation

Semi-supervised Learning for Word Sense Disambiguation

Using Clustering Analysis to Improve Semi-Supervised Classification.

A Semi-Supervised Clustering Algorithm For Data Exploration