Abstract:Deep learning (DL) enables deep neural networks (DNNs) to automatically learn complex tasks or rules from given examples without instructions or guiding principles. As we do not engineer DNNs' functions, it is extremely difficult to diagnose their decisions, and multiple lines of studies proposed to explain principles of DNNs/DL operations. Notably, one line of studies suggests that DNNs may learn concepts, the high level features recognizable to humans. Thus, we hypothesized that DNNs develop abstract codes, not necessarily recognizable to humans, which can be used to augment DNNs' decision-making. To address this hypothesis, we combined foundation segmentation models and unsupervised learning to extract internal codes and identify potential use of abstract codes to make DL's decision-making more reliable and safer.

What problem does this paper attempt to address?

This paper investigates the internal symbols of neural networks (DNNs) in deep learning (DL), which refer to the abstract codes in the hidden layers. These codes may have a significant impact on the decision-making process of DNNs but may not be comprehensible to humans. The study hypothesizes that these abstract codes developed by DNNs can enhance the reliability and security of their decisions. To validate this hypothesis, researchers combine basic segmentation models with unsupervised learning to extract and identify these internal codes. They find that these "symbols" can mediate semantic meanings, be used to monitor the decision-making process of the model, detect adversarial perturbations and abnormal inputs, and temporary learning of abnormal inputs. The paper analyzes the hidden layer responses of DNNs to the ImageNet subset Mixed_13, identifies key regions of interest (ROI) using segmentation models, and finds recurring hidden layer responses, i.e., "symbols," through unsupervised clustering analysis. The researchers propose that these symbols can be transformed into a discrete form, which helps build more reliable and secure DNNs. The research methodology includes: 1. Using segmentation models to identify ROIs and record the corresponding spatial hidden layer responses. 2. Extracting the hidden layer activation vectors corresponding to ROIs and reducing noise through ROI pooling. 3. Applying unsupervised clustering analysis to identify representative activation vectors, i.e., "symbols." The research results show that symbols are related to the semantic meanings of inputs and can predict DNN decisions. In deeper network layers, symbols are more closely associated with specific categories. In addition, symbols can be used to evaluate the confidence of DNN decisions, detect abnormal inputs (such as adversarial samples and out-of-distribution examples), and temporarily learn from out-of-distribution samples. In summary, the paper attempts to address the problem of understanding and explaining the decision-making process of DNNs and how to utilize internal abstract codes to improve the reliability and security of DNNs.

Searching for internal symbols underlying deep learning

Interpreting Deep Learning Model Using Rule-based Method

Interpreting Deep Learning Models for Knowledge Tracing

Deep learning systems as complex networks

Interpreting Deep Learning: The Machine Learning Rorschach Test?

Library network, a possible path to explainable neural networks

Symbol Correctness in Deep Neural Networks Containing Symbolic Layers

Semantics, Representations and Grammars for Deep Learning

On the Transition from Neural Representation to Symbolic Knowledge

Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks

Generalization ability and Vulnerabilities to adversarial perturbations: Two sides of the same coin

Dissecting Deep Learning Networks—Visualizing Mutual Information

Improving Interpretability of Deep Neural Networks with Semantic Information

Topological Interpretability for Deep-Learning

Explainability Tools Enabling Deep Learning in Future In-Situ Real-Time Planetary Explorations

Transparency and Explanation in Deep Reinforcement Learning Neural Networks

The Unreasonable Effectiveness of Deep Learning in Artificial Intelligence

Explaining Deep Neural Networks by Leveraging Intrinsic Methods

Towards interpretable-by-design deep learning algorithms

Deep Symbolic Learning: Discovering Symbols and Rules from Perceptions

Using drawings and deep neural networks to characterize the building blocks of human visual similarity