Interpretable Function Embedding and Module in Convolutional Neural Networks

Wei Han,Zhili Qin,Junming Shao
DOI: https://doi.org/10.1109/icme57554.2024.10687380
2024-01-01
Abstract:In this study, we aim to interpret the hidden semantics of each unit within convolutional neural networks by abstracting local activated patterns of each neuron into corresponding global functions. Unlike existing quantitative hidden-semantics-based explanations which require comparing pixel-wise annotated data one by one, the proposed active interpretability method gives function embeddings after the training without semantic annotation. Specifically, the function of a neuron is denoted as its global expected activated pattern, and therefore feature space and function embedding space are unsupervised aligned during training. The synchronization mechanism is introduced to aggregate scattered function embeddings into function modules, transforming the excessive gray-box interpretations into white-box ones. Moreover, the hard routing guided by function embedding is employed to ensure semantic specificity. We explore the aggregated function modules to showcase the qualitative interpretability of functionally motivated networks. Meanwhile, the proposed method exhibits superior quantitative interpretability metrics such as accuracy, faithfulness, robustness, and complexity.
What problem does this paper attempt to address?