Optimal Neighborhood Multiple Kernel Clustering with Adaptive Local Kernels

Jiyuan Liu,Xinwang Liu,Jian Xiong,Qing Liao,Sihang Zhou,Siwei Wang,Yuexiang Yang
DOI: https://doi.org/10.1109/tkde.2020.3014104
IF: 9.235
2021-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Multiple kernel clustering (MKC) algorithm aims to group data into different categories by optimally integrating information from a group of pre-specified kernels. Though demonstrating superiorities in various applications, we observe that existing MKC algorithms usually do not sufficiently consider the local density around individual data samples and excessively limit the representation capacity of the learned optimal kernel, leading to unsatisfying performance. In this paper, we propose an algorithm, called optimal neighborhood MKC with adaptive local kernels (ON-ALK), to address the two issues. In specific, we construct adaptive local kernels to sufficiently consider the local density around individual data samples, where different numbers of neighbors are discriminatingly selected on each sample. Further, the proposed ON-ALK algorithm boosts the representation of the learned optimal kernel via relaxing it into the neighborhood area of weighted combination of the pre-specified kernels. To solve the resultant optimization problem, a three-step iterative algorithm is designed and theoretically proven to be convergent. After that, we also study the generalization bound of the proposed algorithm. Extensive experiments have been conducted to evaluate the clustering performance. As indicated, the algorithm significantly outperforms state-of-the-art methods in recent literatures on six challenging benchmark datasets, verifying its advantages and effectiveness.
computer science, information systems, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?
This paper aims to solve two main problems in existing multiple kernel clustering (MKC) algorithms: 1. **Failure to fully consider the local density around individual data samples**: Existing MKC algorithms usually do not fully consider the local density around each data point. This may lead to a fixed number of neighbors being selected when constructing the local kernel, and it cannot ensure that all sample pairs have high similarity in the local kernel. This approach ignores the differences in local characteristics of samples, resulting in poor clustering performance. 2. **Over - restricting the representational ability of the learned optimal kernel**: Most MKC algorithms assume that the optimal kernel is a weighted combination of predefined kernels, but ignore that some more powerful kernels may exist in the complement of these combinations. This assumption limits the expressive ability of the optimal kernel and affects the clustering effect. To solve the above problems, the author proposes a new algorithm, called Optimal Neighborhood Multiple Kernel Clustering with Adaptive Local Kernels (ON - ALK). Specifically, the algorithm improves the traditional MKC method through the following two techniques: - **Adaptive local kernel**: By selecting different numbers of neighbors to construct an adaptive local kernel, where the number of neighbors for each sample is dynamically adjusted according to its local similarity. This can better capture the local density information around each sample and improve the accuracy of clustering. - **Optimal neighborhood constraint**: Relax the strict constraint that the optimal kernel must be a linear combination of predefined kernels, allowing the optimal kernel to be located in the neighborhood of the predefined kernel combination. This increases the flexibility and representational ability of the optimal kernel and helps to find more robust kernel functions. Through these two techniques, the ON - ALK algorithm can not only make better use of local information, but also improve the representational ability of the optimal kernel, thereby significantly improving the clustering performance. Experimental results show that this algorithm significantly outperforms existing advanced clustering methods on six challenging benchmark datasets. ### Formula summary 1. **Construction of adaptive local kernel**: - Define the threshold \(\zeta\), for the \(i\)-th sample, its corresponding index set is: \[ \Omega(i)=\{j\mid K(i, j)\geq\zeta\} \] - Construct the indicator matrix \(S(i)\): \[ S(i)(i', j') = \begin{cases} 1 & \text{if } i'\in\Omega(i), j' \text{ is } i' \text{ in } \Omega(i) \text{ of index} \\ 0 & \text{otherwise} \end{cases} \] - The adaptive local kernel of the \(i\)-th sample is: \[ K(i)=S(i)^TKS(i)\in\mathbb{R}^{\mu(i)\times\mu(i)} \] 2. **Optimization objective**: - The final optimization objective is: \[ \min_{H, \beta, J}\frac{1}{n}\sum_{i = 1}^n\left[\text{Tr}(J(i)(I_{\mu(i)}-H(i)H(i)^T))+\beta^T M(i)\beta\right]+\frac{\rho}{2}\|J - K_\beta\|_F^2 \] where \(J(i)=S(i)^TJS(i)\), \(M(i)_{pq}=\text{Tr}(K(i)_pK(i)_q)\), \(K(i)=S(i)^TKS(i)\), \(I_{\mu(i)}\) is the identity matrix of size \(\mu(i)\). Through these improvements, the ON - ALK algorithm can significantly improve the clustering effect while maintaining high efficiency.