Abstract:This study aims to prove the emergence of symbolic concepts (or more precisely, sparse primitive inference patterns) in well-trained deep neural networks (DNNs). Specifically, we prove the following three conditions for the emergence. (i) The high-order derivatives of the network output with respect to the input variables are all zero. (ii) The DNN can be used on occluded samples and when the input sample is less occluded, the DNN will yield higher confidence. (iii) The confidence of the DNN does not significantly degrade on occluded samples. These conditions are quite common, and we prove that under these conditions, the DNN will only encode a relatively small number of sparse interactions between input variables. Moreover, we can consider such interactions as symbolic primitive inference patterns encoded by a DNN, because we show that inference scores of the DNN on an exponentially large number of randomly masked samples can always be well mimicked by numerical effects of just a few interactions.
What problem does this paper attempt to address?
The problem this paper attempts to address is to demonstrate the emergence of symbolic concepts (or more precisely, sparse primitive reasoning patterns) in well-trained deep neural networks (DNNs). Specifically, the authors aim to prove the following three conditions, which ensure that DNNs encode only a sparse interaction among a small number of input variables:
1. **Higher-order derivatives are zero**: All higher-order derivatives of the network output with respect to the input variables are zero.
2. **Use of occluded samples**: DNNs can be used on partially occluded samples, and when the input samples are less occluded, DNNs will produce higher confidence.
3. **Confidence on occluded samples does not significantly drop**: The confidence of DNNs on occluded samples does not significantly decrease.
These conditions are common in many DNNs, and by proving these conditions, the authors illustrate that DNNs encode only a small number of sparse interactions. Moreover, these interactions can be viewed as symbolic primitive reasoning patterns encoded by DNNs, as the authors find that the reasoning scores of DNNs on a large number of randomly occluded samples can be well simulated by the numerical effects of a few interactions.
### Detailed Explanation
1. **Higher-order derivatives are zero**:
- Assumption 1-α: Interaction effects higher than order M are zero, i.e., for all \( S \subseteq N \) and \( |S| \geq M+1 \), \( I(S) = 0 \).
- Assumption 1-β: The network assumes at most M-order non-zero derivatives, i.e., for all \( b \in \mathbb{R}^n \) and \( \kappa_1, \ldots, \kappa_n \in \mathbb{N} \) and \( \kappa_1 + \cdots + \kappa_n \geq M+1 \), \( \frac{\partial^{\kappa_1 + \cdots + \kappa_n} v}{\partial x_1^{\kappa_1} \cdots \partial x_n^{\kappa_n}} \bigg|_{x=b} = 0 \).
2. **Use of occluded samples**:
- Monotonicity Assumption (Assumption 2): The average network output increases monotonically with the size of the unoccluded set \( S \), i.e., for all \( m' \leq m \), \( \bar{u}(m') \leq \bar{u}(m) \), where \( \bar{u}(m) \) is defined as \( \bar{u}(m) = \mathbb{E}_{|S|=m}[u(S)] \), \( u(S) = v(x_S) - v(x_\emptyset) \).
3. **Confidence on occluded samples does not significantly drop**:
- Assumption 3: Given the average network output \( \bar{u}(m) \) of samples with \( m \) unoccluded input variables, we assume a lower bound for the average network output of samples with \( m' \) unoccluded input variables, i.e., for all \( m' \leq m \), \( \bar{u}(m') \geq \left(\frac{m'}{m}\right)^p \bar{u}(m) \), where \( p > 0 \) is a constant.
### Experimental Validation
To validate these assumptions, the authors conducted multiple experiments, including testing these assumptions on different large language models (LLMs). The experimental results show that most samples satisfy these assumptions, thereby supporting the emergence of sparse interactions in DNNs. For example, in models such as OPT-1.3B, LLaMA-7B, and Aquila-7B, over 84% of the samples satisfy the monotonicity assumption.
### Theoretical Proof
The authors mathematically prove that under the above conditions, the interactions in DNNs are sparse. Specifically, they analyze the upper bound of the sum of all k-order interaction effects and prove that these interaction effects can be represented in a polynomial form, further validating the existence of sparsity.
In summary, through rigorous mathematical proofs and experimental validation, this paper aims to demonstrate that in well-trained DNNs, symbolic concepts or sparse primitive reasoning patterns emerge.