p -adic distance and k -Nearest Neighbor classification
Elif Kartal,Fatma Çalışkan,Beyaz Başak Eskişehirli,Zeki Özen
DOI: https://doi.org/10.1016/j.neucom.2024.127400
IF: 6
2024-04-01
Neurocomputing
Abstract:The k -Nearest Neighbor ( k -NN) is a well-known supervised learning algorithm. The effect of the distance used in the analysis on the k -NN performance is very important. According to Ostrowski’s theorem, there are only two nontrivial absolute values on the field of rational numbers, Q , which are the usual absolute value and the p -adic absolute value for a prime p . In view of this theorem, the p -adic absolute value motivates us to calculate the p -adic distance between two samples for the k -NN algorithm. In this study, the p -adic distance on Q was coupled with the k -NN algorithm and was applied to 10 well-known public datasets containing categorical, numerical, and mixed (both categorical and numerical) type predictive attributes. Moreover, the p -adic distance performance was compared with Euclidean, Manhattan, Chebyshev, and Cosine distances. It was seen that the average accuracy obtained from the p -adic distance ranks first in 5 out of 10 datasets. Especially, in mixed datasets, the p -adic distance gave better results than other distances. For r = 1 , 2 , 3 , the effect of the r -decimal values of the number for the p -adic calculation was examined on numerical and mixed datasets. In addition, the p parameter of the p -adic distance was tested with prime numbers less than 29, and it was found that the average accuracy obtained for each p was very close to each other, especially in categorical and mixed datasets. Also, it can be concluded that k -NN with the p -adic distance may be more suitable for binary classification than multi-class classification.
computer science, artificial intelligence