Channel and band attention embedded 3D CNN for model development of hyperspectral image in object-scale analysis

Fengle Zhu,Jianping Cai,Mengzhu He,Xiaoli Li
DOI: https://doi.org/10.1016/j.chemolab.2022.104537
IF: 4.175
2022-01-01
Chemometrics and Intelligent Laboratory Systems
Abstract:Recently there is a rising trend of employing convolutional neural network (CNN) for modeling the complex high-dimensional hyperspectral images in object-scale analysis. Compared with 1D CNN and 2D CNN for merely extracting spectral or spatial features, the 3D CNN naturally offers a more effective method for simultaneously extracting the integrated deep spectral-spatial features. Due to the convolution characteristics of operating within a local receptive field, computer vision studies had incorporated the attention mechanism into 2D CNN to exploit the relationship between features for adaptive feature refinement. No exploration has been reported on incorporating the attention mechanism into 3D CNN for model development of hyperspectral image in object-scale analysis. In this study, we investigated an improved 3D CNN architecture with attention modules embedded for adaptive feature refinement in object-scale hyperspectral image modeling. Besides the adapted channel attention, the band attention module was specially designed to learn the band-wise relationship. Based on the 3D ResNet architecture, various modifications on the arrangement and structure of channel and band attention modules were explored systematically for higher modeling performance. An exemplar hyperspectral image dataset of basil leaves for predicting their relative chlorophyll content (RCC), was applied to evaluate the proposed model. Comprehensive comparison experiments showed performance improvement after adding attention modules into the residual block of 3D ResNet, demonstrating the effectiveness of adaptive feature refinement along channel and band dimensions through the learned attention maps. The sequential channel-band attention module achieved the highest model performance, with testing determination coefficient (R2) of 0.8998. The results indicated the effectiveness of the channel and band attention embedded 3D CNN for model development of hyperspectral image in object-scale analysis.
What problem does this paper attempt to address?