Divide-and-Conquer Anchoring for Near-Separable Nonnegative Matrix Factorization and Completion in High Dimensions

Tianyi Zhou,Wei Bian,Dacheng Tao
DOI: https://doi.org/10.1109/ICDM.2013.29
2013-01-01
Abstract:Nonnegative matrix factorization (NMF) becomes tractable in polynomial time with unique solution under separability assumption, which postulates all the data points are contained in the conical hull of a few anchor data points. Recently developed linear programming and greedy pursuit methods can pick out the anchors from noisy data and results in a near-separable NMF. But their efficiency could be seriously weakened in high dimensions. In this paper, we show that the anchors can be precisely located from low-dimensional geometry of the data points even when their high dimensional features suffer from serious incompleteness. Our framework, entitled divide-and-conquer anchoring (DCA), divides the high-dimensional anchoring problem into a few cheaper sub-problems seeking anchors of data projections in low-dimensional random spaces, which can be solved in parallel by any near-separable NMF, and combines all the detected low-dimensional anchors via a fast hypothesis testing to identify the original anchors. We further develop two non-iterative anchoring algorithms in 1D and 2D spaces for data in convex hull and conical hull, respectively. These two rapid algorithms in the ultra low dimensions suffice to generate a robust and efficient near-separable NMF for high-dimensional or incomplete data via DCA. Compared to existing methods, two vital advantages of DCA are its scalability for big data, and capability of handling incomplete and high-dimensional noisy data. A rigorous analysis proves that DCA is able to find the correct anchors of a rank-k matrix by solving math cal O(klog k) sub-problems. Finally, we show DCA outperforms state-of-the-art methods on various datasets and tasks.
What problem does this paper attempt to address?