Real-time Image Recognition Using Weighted Spatial Pyramid Networks

Xiaoning Zhu,Qingyue Meng,Lize Gu
DOI: https://doi.org/10.1007/s11554-017-0743-y
IF: 2.293
2018-01-01
Journal of Real-Time Image Processing
Abstract:The latest-generation earth observation instruments on airborne and satellite platforms are currently producing an almost continuous high-dimensional data stream. This exponentially growing data poses a new challenge for real-time image processing and recognition. Making full and effective use of the spectral information and spatial structure information of high-resolution remote sensing image is the key to the processing and recognition of high-resolution remote sensing data. In this paper, the adaptive multipoint moment estimation (AMME) stochastic optimization algorithm is proposed for the first time by using the finite lower-order moments and adding the estimating points. This algorithm not only reduces the probability of local optimum in the learning process, but also improves the convergence rate of the convolutional neural network (Lee Cun et al. in Advances in neural information processing systems, 1990). Second, according to the remote sensing image with characteristics of complex background and small sensitive targets, and by automatic discovery, locating small targets, and giving high weights, we proposed a feature extraction method named weighted pooling to further improve the performance of real-time image recognition. We combine the AMME and weighted pooling with the spatial pyramid representation (Harada et al. in Comput Vis Pattern Recognit 1617–1624, 2011) algorithm to form a new, multiscale, and multilevel real-time image recognition model and name it weighted spatial pyramid networks (WspNet). At the end, we use the MNIST, ImageNet, and natural disasters under remote sensing data sets to test WspNet. Compared with other real-time image recognition models, WspNet achieve a new state of the art in terms of convergence rate and image feature extraction compared with conventional stochastic gradient descent method [like AdaGrad, AdaDelta and Adam (Zeiler in Comput Sci, 2012; Kingma and Ba in Comput Sci, 2014; Duchi et al. in J Mach Learn Res 12(7):2121–2159, 2011] and pooling method [like max-pooling, avg-pooling and stochastic-pooling (Zeiler and Fergus in stochastic-pooling for regularization of deep convolutional neural networks, 2013)].
What problem does this paper attempt to address?