Semi-supervised Symmetric Non-negative Matrix Factorization with Low-Rank Tensor Representation

Yuheng Jia,Jia-Nan Li,Wenhui Wu,Ran Wang
2024-10-27
Abstract:Semi-supervised symmetric non-negative matrix factorization (SNMF) utilizes the available supervisory information (usually in the form of pairwise constraints) to improve the clustering ability of SNMF. The previous methods introduce the pairwise constraints from the local perspective, i.e., they either directly refine the similarity matrix element-wisely or restrain the distance of the decomposed vectors in pairs according to the pairwise constraints, which overlook the global perspective, i.e., in the ideal case, the pairwise constraint matrix and the ideal similarity matrix possess the same low-rank structure. To this end, we first propose a novel semi-supervised SNMF model by seeking low-rank representation for the tensor synthesized by the pairwise constraint matrix and a similarity matrix obtained by the product of the embedding matrix and its transpose, which could strengthen those two matrices simultaneously from a global perspective. We then propose an enhanced SNMF model, making the embedding matrix tailored to the above tensor low-rank representation. We finally refine the similarity matrix by the strengthened pairwise constraints. We repeat the above steps to continuously boost the similarity matrix and pairwise constraint matrix, leading to a high-quality embedding matrix. Extensive experiments substantiate the superiority of our method. The code is available at <a class="link-external link-https" href="https://github.com/JinaLeejnl/TSNMF" rel="external noopener nofollow">this https URL</a>.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to use supervision information from a global perspective to improve clustering performance in semi - supervised symmetric non - negative matrix factorization (SSNMF). Existing SSNMF methods mainly introduce pairwise constraints (such as Must - Link, ML and Cannot - Link, CL) from a local perspective, which ignores the global low - rank structure between the pairwise constraint matrix and the ideal similarity matrix. Therefore, this paper proposes a new method. By constructing a three - dimensional tensor and imposing tensor low - rank representation (TLRR), the pairwise constraint matrix and the similarity matrix are simultaneously enhanced from a global perspective, thereby improving the clustering effect. Specifically, the main contributions of the paper are as follows: 1. **Using supervision information from a global perspective**: By constructing a three - dimensional tensor containing the pairwise constraint matrix \( \mathbf{Z} \) and the similarity matrix \( \mathbf{A} \), and imposing tensor low - rank representation, the pairwise constraint matrix and the input similarity matrix are simultaneously enhanced. 2. **Enhanced SNMF model**: An enhanced SNMF model is proposed, which makes the learned embedding matrix have a higher rank to meet the requirements of tensor low - rank representation. Under the same input, this model is more robust than the traditional SNMF and can generate high - quality embedding matrices. 3. **Experimental verification**: The robustness and effectiveness of the proposed method are verified through a series of experiments. The experimental results show that this method is superior to the existing nine state - of - the - art methods. ### Paper background Symmetric non - negative matrix factorization (SNMF) is a commonly used graph clustering method. Clustering is achieved by decomposing a non - negative matrix \( \mathbf{S} \) into the product of two identical non - negative matrices \( \mathbf{V} \mathbf{V}^\top \). Semi - supervised SNMF (SSNMF) improves clustering performance by introducing pairwise constraints (such as ML and CL). However, existing methods mainly introduce these constraints from a local perspective, ignoring the global structure, resulting in insufficient use of supervision information. ### Method overview 1. **Tensor low - rank representation**: - Construct a three - dimensional tensor \( \mathbf{C} \), where the first slice is the pairwise constraint matrix \( \mathbf{Z} \) and the second slice is the similarity matrix \( \mathbf{A} \). - Impose tensor low - rank representation, and the optimization objective function is: \[ \min_{\mathbf{C}, \mathbf{A}, \mathbf{Z}, \mathbf{E}} \|\mathbf{C}\|_\oplus + \lambda \|\mathbf{E}\|_F^2, \] where \(\|\mathbf{C}\|_\oplus\) represents the tensor nuclear norm, \(\|\mathbf{E}\|_F^2\) represents the Frobenius norm, and \(\mathbf{A}_0=\mathbf{A}+\mathbf{E}\). 2. **Enhanced SNMF model**: - Given the similarity matrix \(\mathbf{S}\), first decompose it into a set of embedding matrices \(\{\mathbf{V}_i\}_{i = 1}^m\). - Weight these embedding matrices by an adaptive weight vector \(\alpha\) to construct the final embedding matrix \(\mathbf{V}^*\). - The optimization objective function is: \[ \min_{\alpha,\{\mathbf{V}_i\}_{i = 1}^m,\mathbf{V}^*} \sum_{