Kernel K-Nearest Neighbor Algorithm As a Flexible SAR Modeling Tool

Dong-Sheng Cao,Jian-Hua Huang,Jun Yan,Liang-Xiao Zhang,Qian-Nan Hu,Qing-Song Xu,Yi-Zeng Liang
DOI: https://doi.org/10.1016/j.chemolab.2012.01.008
IF: 4.175
2012-01-01
Chemometrics and Intelligent Laboratory Systems
Abstract:A kernel version of k-nearest neighbor algorithm (k-NN) has been developed to model the complex relationship between molecular descriptors and bioactivities of compounds. Kernel k-NN is to perform the original k-NN algorithm by mapping the training samples in the input space into a high-dimensional feature space. It can be easily constructed by calculating the distance between samples in the feature space, directly deriving from the simple calculation of the kernel used. The developed kernel k-NN is very flexible to deal with complex nonlinear relationship, more importantly; it can also conveniently cope with some non-vectorial data only by the definition of different kernels. The results obtained from several real SAR datasets indicated that the performance of kernel k-NN is comparable to support vector machine methods. It can be regarded as an alternative modeling technique for several chemical problems including the study of structure-activity relationship (SAR). The source codes implementing kernel k-NN in R language are freely available at http://code.google.com/p/kernelmethods/. (C) 2012 Elsevier B.V. All rights reserved.
What problem does this paper attempt to address?