Every decision tree has an influential variable

Ryan O'Donnell,Michael Saks,Oded Schramm,Rocco A. Servedio
DOI: https://doi.org/10.48550/arXiv.cs/0508071
2005-08-16
Abstract:We prove that for any decision tree calculating a boolean function $f:\{-1,1\}^n\to\{-1,1\}$, \[ \Var[f] \le \sum_{i=1}^n \delta_i \Inf_i(f), \] where $\delta_i$ is the probability that the $i$th input variable is read and $\Inf_i(f)$ is the influence of the $i$th variable on $f$. The variance, influence and probability are taken with respect to an arbitrary product measure on $\{-1,1\}^n$. It follows that the minimum depth of a decision tree calculating a given balanced function is at least the reciprocal of the largest influence of any input variable. Likewise, any balanced boolean function with a decision tree of depth $d$ has a variable with influence at least $\frac{1}{d}$. The only previous nontrivial lower bound known was $\Omega(d 2^{-d})$. Our inequality has many generalizations, allowing us to prove influence lower bounds for randomized decision trees, decision trees on arbitrary product probability spaces, and decision trees with non-boolean outputs. As an application of our results we give a very easy proof that the randomized query complexity of nontrivial monotone graph properties is at least $\Omega(v^{4/3}/p^{1/3})$, where $v$ is the number of vertices and $p \leq \half$ is the critical threshold probability. This supersedes the milestone $\Omega(v^{4/3})$ bound of Hajnal and is sometimes superior to the best known lower bounds of Chakrabarti-Khot and Friedgut-Kahn-Wigderson.
Computational Complexity,Discrete Mathematics,Probability
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is the variable influence problem in the decision - tree computation of Boolean functions. Specifically: 1. **Lower Bound of Variable Influence**: The main objective of the paper is to prove that for any Boolean function \( f: \{-1, 1\}^n \to \{-1, 1\} \), its variance \( \text{Var}[f] \) is restricted by the weighted sum of variable influence and query probability. The specific formula is: \[ \text{Var}[f] \leq \sum_{i = 1}^{n} \delta_i \cdot \text{Infi}(f) \] where \( \delta_i \) is the probability that the \( i\) -th input variable is read, and \( \text{Infi}(f) \) is the influence of the \( i\) -th variable on the function \( f \). 2. **Relationship between Decision - Tree Depth and Maximum Influence**: Based on the above inequality, the paper concludes that the minimum decision - tree depth of any balanced Boolean function is at least the reciprocal of the maximum variable influence. That is, if the decision - tree depth of a Boolean function \( f \) is \( d \), then there must be a variable whose influence is at least \( \frac{1}{d} \). 3. **Improvement of Randomized Decision Trees**: The paper also extends this result to randomized decision trees and other more complex situations, such as decision trees on arbitrary product probability spaces and decision trees with non - Boolean outputs. 4. **Randomized Query Complexity of Graph Properties**: As an application, the paper gives a simple and powerful proof, showing that the randomized query complexity of non - trivial monotone graph properties is at least \( \Omega\left(\frac{v^{4/3}}{p^{1/3}}\right) \), where \( v \) is the number of vertices and \( p \leq \frac{1}{2} \) is the critical threshold probability. This surpasses the previous milestone results and is better than the known best lower bounds in some cases. In conclusion, by introducing new inequalities, this paper provides a profound understanding of the relationship between the decision - tree complexity of Boolean functions and variable influence, and improves the existing results in multiple aspects.