On the Equivalence of Nonnegative Matrix Factorization and K-means - Spectral Clustering
Chris Ding,Xiaofeng He,Horst D. Simon,Rong Jin
2005-01-01
Abstract:On the Equivalence of Nonnegative Matrix Factorization and K-means — Spectral Clustering Chris Ding ∗ Xiaofeng He ∗ Horst D. Simon ∗ Rong Jin † December 4, 2005 Abstract We provide a systematic analysis of nonnegative matrix factorization (NMF) relating to data cluster- ing. We generalize the usual X = F G T decomposition to the symmetric W = HH T and W = HSH T decompositions. We show that (1) W = HH T is equivalent to Kernel K-means clustering and the Laplacian-based spectral clustering. (2) X = F G T is equivalent to simultaneous clustering of rows and columns of a bipartite graph. We emphasizes the importance of orthogonality in NMF and soft clustering nature of NMF. These results are verified with experiments on face images and newsgroups. Introduction Standard factorization of a data matrix uses singular value decomposition (SVD) as widely used in principal component analysis (PCA). However, for many dataset such as images and text, the original data matrices are nonnegative. A factorization such as SVD contains negative entries and is difficult to interpret for some applications. In contrast, nonnegative matrix factorization (NMF) [18, 19] restricts the entries in matrix factors to be nonnegative. NMF has been shown recently to be useful for many applications in environment [25], chemometrics [29], pattern recognition [20], multimedia [6], text mining [31, 26] and DNA gene expressions [3]. This is also extended to classification [27]. A number of stuides focus on further developing NMF computational methodologies [15, 22, 26, 5, 21]. Let X = (x 1 , . . . , x n ) ∈ R p×n be the data matrix of nonnegative elements. In image processing, each column x i is a 2D array of pixels gray level. In text processing, each column is a document. The NMF factorizes X into two nonnegative matrices, X ≈ F G T , n×k where F = (f 1 , · · · , f k ) ∈ R p×k and G = (g 1 , · · · , g k ) ∈ R + . k is a pre-specified parameter. NMF can be traced back to 1970s (communication from Gene Golub) and has been studied by Paatero [25, 29]. The work of Lee and Seung [18, 19] brought much attention to NMF in machine learning and data mining communities. There appears to have some confusions, however. Lee and Seung emphasizes[18] that NMF factors f k contain coherent parts of the original data (images), for example a nose or an eye. Later experiments [16, 20] do not support the parts-of-whole interpretation of NMF. In fact, Hoyer[16] and Li, et al[20] specifically propose sparsification schemes to achieve the parts-of-whole pictures. ∗ Lawrence † Department Berkeley National Laboratory, University of California, Berkeley, CA 94720. of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824.