A classification framework for multivariate compositional data with Dirichlet feature embedding

Jie Gu,Bin Cui,Shan Lu
DOI: https://doi.org/10.1016/j.knosys.2020.106614
2021-01-01
Abstract:<p>Compositional data which contain relative or structure information of a whole occur commonly in many disciplines and practical scenarios. Yet relatively few works are available for multivariate compositional data classification with different numbers of parts using machine learning. This is because compositional data is inherently constrained to unit sum, resulting in the existing methods cannot be directly applied. Particularly, the multivariate analysis methods for compositional data variables with unequal sizes of parts are not sufficiently investigated. Moreover, to design a good classification model is indeed a complicated work. Except for the learning algorithm, data quality is also an essential determinant, which is rarely been concerned. In this paper, we propose an effective framework for multivariate compositional data classification. Specifically, the Dirichlet feature embedding is proposed to implement on the original compositional data features with the goal of removing the constraint and obtaining high quality training data, as well as reducing the dimension. Support vector machine is then used to build the classification model. Results of simulation study and real-world dataset show our proposed method can achieve good performances.</p>
computer science, artificial intelligence
What problem does this paper attempt to address?