Data-driven cluster analysis method: a novel outliers detection method in multivariate data

A. R. Duarte,J. J. Barbosa,H. S. R. Martins,F. L. P. Oliveira
DOI: https://doi.org/10.1080/03610918.2024.2376872
2024-07-13
Communications in Statistics - Simulation and Computation
Abstract:Detection of multivariate outliers is crucial in statistical studies. On the other hand, the statistical applications without identifying possible outliers may present incorrect results. This study proposes a new technique for detecting multivariate outliers based on cluster analysis. The method considers information inherent in the data itself. We compare the methodology with three detection methods that are already widespread. The comparative investigation considers detection techniques based on the Mahalanobis distance. Sensitivity, specificity, and accuracy measures are used to assess the quality of the methods, as well as an analysis of the CPU time required to carry out the procedures. The new technique revealed a notorious superiority over others.
statistics & probability
What problem does this paper attempt to address?