Kent feature embedding for classification of compositional data with zeros

Shan Lu,Wenjing Wang,Rong Guan
DOI: https://doi.org/10.1007/s11222-024-10382-z
IF: 2.3241
2024-02-02
Statistics and Computing
Abstract:Compositional data have posed challenges to current classification methods owing to the non-negative and unit-sum constraints, especially when a certain of the components are zeros. In this paper, we develop an effective classification method for multivariate compositional data with certain of the components equal to zero. Specifically, a Kent feature embedding technique is first proposed to transform compositional data and improve data quality. We then use support vector machine as the state-of-the-art machine learning model to build the classifier. The proposed method is proved to be effective through numerical simulations. Results on multiple real datasets, including species classification, day-night image classification and household's consumption pattern recognition, further verify that the proposed method can achieve good classification performance and outperform the other competitors. This method would help to broaden the practical usage of compositional data with zeros in the task of classification.
statistics & probability,computer science, theory & methods
What problem does this paper attempt to address?