Abstract:To cluster data that are not linearly separable in the original feature space, <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.211ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 521.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-6B" x="0" y="0"></use></g></svg></span> -means clustering was extended to the kernel version. However, the performance of kernel <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.211ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 521.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-6B" x="0" y="0"></use></g></svg></span> -means clustering largely depends on the choice of the kernel function. To mitigate this problem, multiple kernel learning has been introduced into the <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.211ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 521.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-6B" x="0" y="0"></use></g></svg></span> -means clustering to obtain an optimal kernel combination for clustering. Despite the success of multiple kernel <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.211ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 521.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-6B" x="0" y="0"></use></g></svg></span> -means clustering in various scenarios, few of the existing work update the combination coefficients based on the diversity of kernels, which leads to the result that the selected kernels contain high redundancy and would degrade the clustering performance and efficiency. We resolve this problem from the perspective of subset selection in this article. In particular, we first propose an effective strategy to select a diverse subset from the prespecified kernels as the representative kernels, and then incorporate the subset selection process into the framework of multiple <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.211ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 521.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-6B" x="0" y="0"></use></g></svg></span> -means clustering. The representative kernels can be indicated as a significant combination weights. Due to the nonconvexity of the obtained objective function, we develop an alternating minimization method to optimize the combination coefficients of the selected kernels and the cluster membership alternatively. In particular, an efficient optimization method is developed to reduce the time complexity of optimizing the kernel combination weights. Finally, extensive experiments on benchmark and real-world data sets demonstrate the effectiveness and superiority of our approach in comparison with existin- methods.<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><defs id="MathJax_SVG_glyphs"><path stroke-width="1" id="MJMATHI-6B" d="M121 647Q121 657 125 670T137 683Q138 683 209 688T282 694Q294 694 294 686Q294 679 244 477Q194 279 194 272Q213 282 223 291Q247 309 292 354T362 415Q402 442 438 442Q468 442 485 423T503 369Q503 344 496 327T477 302T456 291T438 288Q418 288 406 299T394 328Q394 353 410 369T442 390L458 393Q446 405 434 405H430Q398 402 367 380T294 316T228 255Q230 254 243 252T267 246T293 238T320 224T342 206T359 180T365 147Q365 130 360 106T354 66Q354 26 381 26Q429 26 459 145Q461 153 479 153H483Q499 153 499 144Q499 139 496 130Q455 -11 378 -11Q333 -11 305 15T277 90Q277 108 280 121T283 145Q283 167 269 183T234 206T200 217T182 220H180Q168 178 159 139T145 81T136 44T129 20T122 7T111 -2Q98 -11 83 -11Q66 -11 57 -1T48 16Q48 26 85 176T158 471L195 616Q196 629 188 632T149 637H144Q134 637 131 637T124 640T121 647Z"></path></defs></svg>

KM-MIC: an Improved Maximum Information Coefficient Based on K-Medoids Clustering

An improved algorithm for the maximal information coefficient and its application

Improved Approximation Algorithm for Maximal Information Coefficient.

Fast Search Local Extremum for Maximal Information Coefficient (MIC).

The Generalized Mean Information Coefficient

Detecting novel multi-variable associations in big data based on MIC

Kernel correlation–dissimilarity for Multiple Kernel k-Means clustering

Kernel Correlation-Dissimilarity for Multiple Kernel k-Means Clustering

Analysing Large Biological Data Sets with an Improved Algorithm for MIC.

Detecting Unbiased Associations in Large Data Sets

CoIn: Correlation Induced Clustering for Cognition of High Dimensional Bioinformatics Data

Equitability Analysis of the Maximal Information Coefficient, with Comparisons

Metric for measuring the effectiveness of clustering of DNA microarray expression

Analyzing Large Biological Datasets with an Improved Algorithm for MIC

Min max kurtosis distance based improved initial centroid selection approach of K-means clustering for big data mining on gene expression data

An Improved K-Means Algorithm Based on Kurtosis Test

SuperMIC: Analyzing Large Biological Datasets in Bioinformatics with Maximal Information Coefficient

A Kernel-Based Intuitionistic Fuzzy C-Means Clustering Using Improved Multi-Objective Immune Algorithm

Multiple Kernel k -Means Clustering by Selecting Representative Kernels

Railway Accidents Analysis Based On The Improved Algorithm Of The Maximal Information Coefficient

The Application of an Improved K-Means Clustering Method in Microarray Gene Expressing Data