Asymptotic behavior of some multicategory classification methods for high-dimensional data

Dorilian García-Cerino,Addy Bolívar-Cimé,Victor Pérez-Abreu
DOI: https://doi.org/10.1080/03610918.2024.2347923
2024-05-18
Communications in Statistics - Simulation and Computation
Abstract:We consider multicategory extensions of binary discrimination methods via one-versus-one (OVO) or one-versus-rest (OVR) methodologies, focusing on extensions of the binary classification by linear mean difference (MD), support vector machine (SVM), maximal data piling (MDP), and distance weighted discrimination (DWD) via OVO, and the multicategory extension of MD via OVR, in the context of high-dimensional and low sample size (HDLSS) data. The asymptotic behavior of OVO-MD, OVO-SVM, OVO-MDP and OVO-DWD is described when the dimension of the data increases and the sample size is fixed, in terms of the probabilities of correct classification of a new data point, finding sufficient conditions for the correct classification probabilities to converge to one as the dimension approaches infinity. As in the binary case, OVO-MD, OVO-SVM and OVO-MDP have the same asymptotic behavior while OVO-DWD could behave differently. We also consider the asymptotic behavior of the OVR-MD methodology providing necessary and sufficient conditions for a new data point of a given class to be correctly classified with probability tending to one. A simulation experiment is conducted to further compare the methodologies, and consider the four binary methods in the OVR case. We evaluate the performance of the considered methods using a microarray data set.
statistics & probability
What problem does this paper attempt to address?