Semi-supervised K-Means Clustering for Multi-Type Relational Data

Ying Gao,Hong Qi,Da-You Liu,He Liu
DOI: https://doi.org/10.1109/icmlc.2008.4620425
2008-01-01
Journal of Software
Abstract:In many data mining tasks, there is a large supply of unlabeled data but limited labeled data since it is expensive generated. Therefore, a number of semi-supervised clustering algorithms have been proposed, but few of them are specially designed for multi-type relational data. In this paper, a semi-supervised k-means clustering algorithm for multi-type relational data is proposed, which is based on the combination of semi-supervised k-means method and multi-type relational data clustering. In order to achieve high performance, in the algorithm, we first analyze all kinds of relationships in data, which include intra-relationship, inter-relationship, explicit and implicit relationship; and then extend k-means clustering algorithm by seeding and new similarity measures, where attributes information, labeled data and all kinds of relationships are employed. The experimental results show the effectiveness of our method.
What problem does this paper attempt to address?