Searching for internal symbols underlying deep learning

Jung H. Lee,Sujith Vijayan
2024-05-31
Abstract:Deep learning (DL) enables deep neural networks (DNNs) to automatically learn complex tasks or rules from given examples without instructions or guiding principles. As we do not engineer DNNs' functions, it is extremely difficult to diagnose their decisions, and multiple lines of studies proposed to explain principles of DNNs/DL operations. Notably, one line of studies suggests that DNNs may learn concepts, the high level features recognizable to humans. Thus, we hypothesized that DNNs develop abstract codes, not necessarily recognizable to humans, which can be used to augment DNNs' decision-making. To address this hypothesis, we combined foundation segmentation models and unsupervised learning to extract internal codes and identify potential use of abstract codes to make DL's decision-making more reliable and safer.
Machine Learning,Artificial Intelligence,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper investigates the internal symbols of neural networks (DNNs) in deep learning (DL), which refer to the abstract codes in the hidden layers. These codes may have a significant impact on the decision-making process of DNNs but may not be comprehensible to humans. The study hypothesizes that these abstract codes developed by DNNs can enhance the reliability and security of their decisions. To validate this hypothesis, researchers combine basic segmentation models with unsupervised learning to extract and identify these internal codes. They find that these "symbols" can mediate semantic meanings, be used to monitor the decision-making process of the model, detect adversarial perturbations and abnormal inputs, and temporary learning of abnormal inputs. The paper analyzes the hidden layer responses of DNNs to the ImageNet subset Mixed_13, identifies key regions of interest (ROI) using segmentation models, and finds recurring hidden layer responses, i.e., "symbols," through unsupervised clustering analysis. The researchers propose that these symbols can be transformed into a discrete form, which helps build more reliable and secure DNNs. The research methodology includes: 1. Using segmentation models to identify ROIs and record the corresponding spatial hidden layer responses. 2. Extracting the hidden layer activation vectors corresponding to ROIs and reducing noise through ROI pooling. 3. Applying unsupervised clustering analysis to identify representative activation vectors, i.e., "symbols." The research results show that symbols are related to the semantic meanings of inputs and can predict DNN decisions. In deeper network layers, symbols are more closely associated with specific categories. In addition, symbols can be used to evaluate the confidence of DNN decisions, detect abnormal inputs (such as adversarial samples and out-of-distribution examples), and temporarily learn from out-of-distribution samples. In summary, the paper attempts to address the problem of understanding and explaining the decision-making process of DNNs and how to utilize internal abstract codes to improve the reliability and security of DNNs.