Convex Clustering Method for Compositional Data Via Sparse Group Lasso

Xiaokang Wang,Huiwen Wang,Shanshan Wang,Jidong Yuan
DOI: https://doi.org/10.1016/j.neucom.2020.10.105
IF: 6
2021-01-01
Neurocomputing
Abstract:High-dimensional sparse clustering with compositional data is of great practical importance, as exemplified by applications in high-throughput gene expression profiles analysis. In this paper, we develop a compositional clustering framework based on convex clustering, which is a convex relaxation of hierarchical clustering that incorporates a fused penalty term on the cluster prototypes. To explicitly deal with the issue of high dimensionality and sparsity, we propose the Compositional Convex Clustering with Sparse Group Lasso (CCC-SGL). The isometric logratio (ilr) transformation is first applied to transform the composition in the simplex space to the standard Euclidean geometry. Then, a group lasso penalty and a lasso penalty are imposed on the cluster centers, which effectively selects informative features and promotes within-feature sparsity. The proposed convex clustering formulation is numerically and efficiently solved with the proximal gradient descent algorithm within the Alternating Direction Method of Multipliers (ADMM) framework. Simulation studies are carried out to evaluate the performance of the proposed methodology and also a real data set in microbiome sequencing is analyzed. (c) 2020 Elsevier B.V. All rights reserved.
What problem does this paper attempt to address?