Design and implementation of a parallel geographically weighted -nearest neighbor classifier.

Yingxia Pu,Xinyi Zhao,Guangqing Chi,Shuhe Zhao,Jiechen Wang,Zhibin Jin,Junjun Yin
DOI: https://doi.org/10.1016/j.cageo.2019.02.009
2019-01-01
Abstract:The development of high-performance classifiers represents an important step in improving the timeliness of remote sensing classification in the era of high spatial resolution. Geographically weighted k-nearest neighbors (gwk-NN)-a classifier that incorporates spatial information into the traditional k-NN classifier-has demonstrated to be better at mitigating salt-and-pepper noise and misclassification. However, the integration of spatial dependence relationships into spectral information is computationally intensive. To improve computing performance, this paper discusses two commonly used parallel strategies-data and task parallelism-used to parallelize the gwk-NN classifier in the model training and classification stages, and implements the parallel algorithm by calling MPI and GDAL in the C++ development environment on a standalone eight-core computer. We further investigate the potential performance of dual parallelism (the simultaneous exploitation of data and task parallelism) in image classification. The experimental results demonstrate that the parallel gwk-NN classifier can improve the efficiency of high-resolution, remotely sensed images with multiple land cover types. Specifically, data parallelism is more effective than task parallelism in both model training and classification stages because of the minor role of parallel overhead in total execution time. In addition, dual parallelism can take advantage of data and task parallel strategies in the image classification stage, as evidenced by the two largest speedups attained under dual parallelism I (5.28×) and II (5.73×). Comparatively, dual parallelism II, in which priority is given to data decomposition, achieves the best performance by overlapping computation and data transmission, which is compatible with the current trend toward multicore architectures.
What problem does this paper attempt to address?