Abstract:Convolutional neural network (CNN) plays a vital role in numerous classification tasks; however, its lack of interpretability limits its application in medical image diagnosis. To tackle this issue, we propose Attention U-net, an interpretable classification model that can generate high-resolution localization maps for the predicted class. The novelty of our model is to adopt an upsampling-concatenating-convolution structure to create a fine-grained segmentation map and use attention pooling over the prior mask for bridging segmentation with classification. Since the relationship between segmentation and classification is equivalent to the formulation of the multiple instance learning (MIL), the attention pooling can be viewed as a MIL pooling function. In the attention pooling, the attention weights can be seen as a localization map, and thus provide evidence of classification. We integrate our model with grad-CAM (class activation mapping), a widely used method for CNN localization, and we prove that our attention-based localization map is highly correlated to the grad-CAM-integrated localization map. We apply our proposed model to the automatic diagnosis of lung diseases with Chest X-ray. Experimental results show that our model can reach high performance on both classification and interpretability simultaneously.

Attention U-net for Interpretable Classification on Chest X-ray Image