Eigen selection in spectral clustering: a theory guided practice

Xiao Han,Xin Tong,Yingying Fan
DOI: https://doi.org/10.1080/01621459.2021.1917418
IF: 4.369
2021-04-16
Journal of the American Statistical Association
Abstract:Based on a Gaussian mixture type model of <i>K</i> components, we derive eigen selection procedures that improve the usual spectral clustering algorithms in high-dimensional settings, which typically act on the top few eigenvectors of an affinity matrix (e.g., <span class="NLM_disp-formula inline-formula"><math>X⊤X</math></span>) derived from the data matrix <span class="NLM_disp-formula inline-formula"><math>X</math></span>. Our selection principle formalizes two intuitions: (i) eigenvectors should be dropped when they have no clustering power; (ii) some eigenvectors corresponding to smaller spiked eigenvalues should be dropped due to estimation inaccuracy. Our selection procedures lead to new spectral clustering algorithms: ESSC for <i>K</i> = 2 and GESSC for <i>K</i> &gt; 2. The newly proposed algorithms enjoy better stability and compare favorably against canonical alternatives, as demonstrated in extensive simulation and multiple real data studies. <a class="ext-link" href="https://doi.org/10.1080/01621459.2021.1917418">Supplementary materials</a> for this article are available online.
statistics & probability
What problem does this paper attempt to address?