Sharpening Occam's Razor

Ming Li,John Tromp,Paul Vitanyi
DOI: https://doi.org/10.48550/arXiv.cs/0201005
2002-10-11
Abstract:We provide a new representation-independent formulation of Occam's razor theorem, based on Kolmogorov complexity. This new formulation allows us to: (i) Obtain better sample complexity than both length-based and VC-based versions of Occam's razor theorem, in many applications. (ii) Achieve a sharper reverse of Occam's razor theorem than previous work. Specifically, we weaken the assumptions made in an earlier publication, and extend the reverse to superpolynomial running times.
Machine Learning,Disordered Systems and Neural Networks,Artificial Intelligence,Computational Complexity,Probability,Data Analysis, Statistics and Probability
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is to improve and extend Occam's Razor theorem to make it more general and perform better. Specifically, the author aims to introduce Kolmogorov complexity to provide a new, representation - independent formulation of Occam's Razor theorem in order to achieve the following goals: 1. **Better sample complexity**: Compared with the traditional versions based on length and VC - dimension, the new method can obtain better sample complexity in many application scenarios. 2. **Sharper converse theorems**: The author hopes to weaken the assumptions in previous works and extend the converse theorems to super - polynomial running times. ### Background of the paper Occam's Razor theorem plays an important role in PAC learning theory. Its core idea is "simplicity and effectiveness", that is, in order to conduct effective PAC learning, data compression is a sufficient condition. Existing research has shown that the necessity of compression has also been partially proven (Board and Pitt, 1990). However, these results depend on specific representation formats, which limit their applicability and universality. ### Main contributions 1. **New formulation based on Kolmogorov complexity**: - The author introduces Kolmogorov complexity as an effective tool for measuring compression, thus proposing a representation - independent Occam's Razor theorem. - This new method not only provides better sample complexity, but also is more convenient than length - based methods when dealing with discrete problems. 2. **Improved sample complexity**: - By using Kolmogorov complexity, the author shows how to significantly reduce sample complexity in some cases. For example, in DNA sequencing applications, the sample complexity can be reduced by about 7 times compared with the length - based method. 3. **Wider converse theorems**: - The author relaxes the requirements on polynomial time and sample complexity in previous works, making the converse theorems applicable to a wider range of situations. ### Formula summary - Upper bound of sample complexity in VC - dimension version: \[ m(H, \delta, \epsilon) \leq \frac{4}{\epsilon} \left(d \log_2 \frac{12}{\epsilon} + \log_2 \frac{1}{\delta}\right) \] - Lower bound of sample complexity (Ehrenfeucht et al., 1989): \[ m(H, \delta, \epsilon) > \max\left(\frac{d - 1}{32\epsilon}, \frac{1}{\epsilon} \ln \frac{1}{\delta}\right) \] - Sample complexity based on length: \[ m = \max\left(\frac{2}{\epsilon} \ln \frac{1}{\delta}, \left(\frac{(2 \ln 2)s \beta}{\epsilon}\right)^{\frac{1}{1 - \alpha}}\right) \] - Sample complexity in Kolmogorov complexity version: \[ m(n, s, \epsilon, \delta) = \max\left\{\frac{2}{\epsilon} \ln \frac{2}{\delta}, f^{-1}\left(\frac{2 \ln 2}{\epsilon}, n, s, \frac{\delta}{2}\right)\right\} \] Through these improvements, the author not only improves the theoretical sample complexity, but also provides more effective learning algorithms for practical applications.