Abstract:Low dimensional nonlinear structure abounds in datasets across computer vision and machine learning. Kernelized matrix factorization techniques have recently been proposed to learn these nonlinear structures for denoising, classification, dictionary learning, and missing data imputation, by observing that the image of the matrix in a sufficiently large feature space is low-rank. However, these nonlinear methods fail in the presence of sparse noise or outliers. In this work, we propose a new robust nonlinear factorization method called Robust Non-Linear Matrix Factorization (RNLMF). RNLMF constructs a dictionary for the data space by factoring a kernelized feature space; a noisy matrix can then be decomposed as the sum of a sparse noise matrix and a clean data matrix that lies in a low dimensional nonlinear manifold. RNLMF is robust to sparse noise and outliers and scales to matrices with thousands of rows and columns. Empirically, RNLMF achieves noticeable improvements over baseline methods in denoising and clustering.
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is the poor performance of existing non - linear matrix factorization methods when dealing with data containing sparse noise or outliers. Specifically, the author points out that although current kernelized matrix factorization techniques can capture the non - linear low - dimensional structures in data, these methods will fail in the face of sparse noise or outliers. Therefore, this paper proposes a new robust non - linear matrix factorization method (Robust Non - Linear Matrix Factorization, RNLMF), aiming to overcome this limitation.
The main objectives of RNLMF are:
1. **Construct a robust dictionary learning framework**: By performing matrix factorization in the kernel feature space, RNLMF can extract a clean data matrix on a low - dimensional non - linear manifold from the original data.
2. **Separate sparse noise and outliers**: Decompose a noise - contaminated matrix into a sparse noise matrix and a clean data matrix, thereby achieving effective denoising.
3. **Scale to large - scale datasets**: The RNLMF method can perform efficient computations on large - scale matrices with thousands of rows and columns.
### Specific Problem Description
In practical applications, many datasets (such as in the fields of computer vision, machine learning, etc.) have a large number of low - dimensional non - linear structures. Although existing kernel - based methods can capture these non - linear structures, they are not effective in dealing with sparse noise or outliers. For example:
- **Sparse noise**: There may be a small number of high - amplitude noise points in the data, and these noise points will seriously affect the learning effect of the model.
- **Outliers**: There may be some outliers in the dataset that are significantly different from the normal data distribution, and these outliers will also interfere with the performance of the model.
To address these problems, RNLMF proposes a new optimization model that can perform dictionary learning, denoising, and clustering simultaneously in the kernel feature space and is robust to sparse noise and outliers.
### Mathematical Model
The core idea of RNLMF is to perform matrix factorization in the kernel feature space. Assume that the original data matrix \(\hat{X}\in\mathbb{R}^{m\times n}\) is composed of a clean data matrix \(X\in\mathbb{R}^{m\times n}\) and a sparse noise matrix \(E\in\mathbb{R}^{m\times n}\), that is:
\[
\hat{X}=X + E
\]
where \(X\) can be represented as:
\[
\phi(X)\approx\phi(D)C
\]
Here, \(\phi(\cdot)\) is the feature mapping induced by the kernel function, \(D\in\mathbb{R}^{m\times d}\) is the dictionary matrix, \(C\in\mathbb{R}^{d\times n}\) is the coefficient matrix, and \(d <\min(m, n)\). RNLMF achieves robust non - linear matrix factorization by minimizing the following objective function:
\[
\min_{D, C, E}L(D, C, E)+R(D, C, E)
\]
where,
\[
L(D, C, E)=\frac{1}{2}\|\phi(\hat{X}-E)-\phi(D)C\|_F^2
\]
\[
R(D, C, E)=\lambda_D R(D)+\lambda_C R(C)+\lambda_E R(E)
\]
The regularization terms \(R(D)\), \(R(C)\), \(R(E)\) are used to control the complexity of the dictionary matrix, coefficient matrix, and noise matrix respectively. Specific regularization forms can be selected according to different application scenarios. For example:
- For \(E\), we can choose \(R(E)=\|E\|_1\) to handle sparse noise.
- For \(C\), we can choose \(R(C)=\|C\|_1\) or \(R(C)=\|C\|_*\) to encourage...