Sparse Non-Negative Matrix Factorization for Uncertain Data Clustering

Danyang Chen,Xiangyu Wang,Xiu Xu,Cheng Zhong,Jinhui Xu
DOI: https://doi.org/10.3233/ida-205622
IF: 1.7
2022-01-01
Intelligent Data Analysis
Abstract:We consider the problem of clustering a set of uncertain data, where each data consists of a point-set indicating its possible locations. The objective is to identify the representative for each uncertain data and group them into k clusters so as to minimize the total clustering cost. Different from other models, our model does not assume that there is a probability distribution for each uncertain data. Thus, all possible locations need to be considered to determine the representative. Existing methods for this problem are either impractical or have difficulty to handle large-scale datasets due to their pairwise-distance based global search strategy and expensive optimization computation. In this paper, we propose a novel sparse Non-negative Matrix Factorization (NMF) method which measures the similarity of uncertain data by their most commonly shared features. A divide-and-conquer approach is adopted to remarkably improve the efficiency. A novel diagonal l0-constraint and its l1 relaxation are proposed to overcome the challenge of determining the representatives. We give a detailed analysis to show the correctness of our method, and provide an effective initialization and peeling strategy to enhance the ability of processing large-scale datasets. Experimental results on some benchmark datasets confirm the effectiveness of our method.
What problem does this paper attempt to address?