Abstract:This study aims to prove the emergence of symbolic concepts (or more precisely, sparse primitive inference patterns) in well-trained deep neural networks (DNNs). Specifically, we prove the following three conditions for the emergence. (i) The high-order derivatives of the network output with respect to the input variables are all zero. (ii) The DNN can be used on occluded samples and when the input sample is less occluded, the DNN will yield higher confidence. (iii) The confidence of the DNN does not significantly degrade on occluded samples. These conditions are quite common, and we prove that under these conditions, the DNN will only encode a relatively small number of sparse interactions between input variables. Moreover, we can consider such interactions as symbolic primitive inference patterns encoded by a DNN, because we show that inference scores of the DNN on an exponentially large number of randomly masked samples can always be well mimicked by numerical effects of just a few interactions.

What problem does this paper attempt to address?

The problem this paper attempts to address is to demonstrate the emergence of symbolic concepts (or more precisely, sparse primitive reasoning patterns) in well-trained deep neural networks (DNNs). Specifically, the authors aim to prove the following three conditions, which ensure that DNNs encode only a sparse interaction among a small number of input variables: 1. **Higher-order derivatives are zero**: All higher-order derivatives of the network output with respect to the input variables are zero. 2. **Use of occluded samples**: DNNs can be used on partially occluded samples, and when the input samples are less occluded, DNNs will produce higher confidence. 3. **Confidence on occluded samples does not significantly drop**: The confidence of DNNs on occluded samples does not significantly decrease. These conditions are common in many DNNs, and by proving these conditions, the authors illustrate that DNNs encode only a small number of sparse interactions. Moreover, these interactions can be viewed as symbolic primitive reasoning patterns encoded by DNNs, as the authors find that the reasoning scores of DNNs on a large number of randomly occluded samples can be well simulated by the numerical effects of a few interactions. ### Detailed Explanation 1. **Higher-order derivatives are zero**: - Assumption 1-α: Interaction effects higher than order M are zero, i.e., for all \( S \subseteq N \) and \( |S| \geq M+1 \), \( I(S) = 0 \). - Assumption 1-β: The network assumes at most M-order non-zero derivatives, i.e., for all \( b \in \mathbb{R}^n \) and \( \kappa_1, \ldots, \kappa_n \in \mathbb{N} \) and \( \kappa_1 + \cdots + \kappa_n \geq M+1 \), \( \frac{\partial^{\kappa_1 + \cdots + \kappa_n} v}{\partial x_1^{\kappa_1} \cdots \partial x_n^{\kappa_n}} \bigg|_{x=b} = 0 \). 2. **Use of occluded samples**: - Monotonicity Assumption (Assumption 2): The average network output increases monotonically with the size of the unoccluded set \( S \), i.e., for all \( m' \leq m \), \( \bar{u}(m') \leq \bar{u}(m) \), where \( \bar{u}(m) \) is defined as \( \bar{u}(m) = \mathbb{E}_{|S|=m}[u(S)] \), \( u(S) = v(x_S) - v(x_\emptyset) \). 3. **Confidence on occluded samples does not significantly drop**: - Assumption 3: Given the average network output \( \bar{u}(m) \) of samples with \( m \) unoccluded input variables, we assume a lower bound for the average network output of samples with \( m' \) unoccluded input variables, i.e., for all \( m' \leq m \), \( \bar{u}(m') \geq \left(\frac{m'}{m}\right)^p \bar{u}(m) \), where \( p > 0 \) is a constant. ### Experimental Validation To validate these assumptions, the authors conducted multiple experiments, including testing these assumptions on different large language models (LLMs). The experimental results show that most samples satisfy these assumptions, thereby supporting the emergence of sparse interactions in DNNs. For example, in models such as OPT-1.3B, LLaMA-7B, and Aquila-7B, over 84% of the samples satisfy the monotonicity assumption. ### Theoretical Proof The authors mathematically prove that under the above conditions, the interactions in DNNs are sparse. Specifically, they analyze the upper bound of the sum of all k-order interaction effects and prove that these interaction effects can be represented in a polynomial form, further validating the existence of sparsity. In summary, through rigorous mathematical proofs and experimental validation, this paper aims to demonstrate that in well-trained DNNs, symbolic concepts or sparse primitive reasoning patterns emerge.

Where We Have Arrived in Proving the Emergence of Sparse Symbolic Concepts in AI Models

Where We Have Arrived in Proving the Emergence of Sparse Interaction Primitives in AI Models

Defining and Quantifying the Emergence of Sparse Concepts in DNNs

Does a Neural Network Really Encode Symbolic Concepts?

Neuro-Symbolic AI: An Emerging Class of AI Workloads and their Characterization

Exploring Hidden Semantics in Neural Networks with Symbolic Regression

Analyzing Deep Neural Networks with Symbolic Propagation: Towards Higher Precision and Faster Verification

Symbol Correctness in Deep Neural Networks Containing Symbolic Layers

Emergence of machine language: towards symbolic intelligence with neural networks

Towards the Dynamics of a DNN Learning Symbolic Interactions

Concept Learning in the Wild: Towards Algorithmic Understanding of Neural Networks

Sparse Linear Concept Discovery Models

Can the Inference Logic of Large Language Models be Disentangled into Symbolic Concepts?

Learning from Emergence: A Study on Proactively Inhibiting the Monosemantic Neurons of Artificial Neural Networks

Semantics in Deep Neural-Network Computing

Emergence of Symbols in Neural Networks for Semantic Understanding and Communication

Are Sparse Neural Networks Better Hard Sample Learners?

A Neural-Guided Dynamic Symbolic Network for Exploring Mathematical Expressions from Data

Universal structural patterns in sparse recurrent neural networks

Towards Symbolic XAI -- Explanation Through Human Understandable Logical Relationships Between Features

Attributing Learned Concepts in Neural Networks to Training Data