A Mathematical Analysis of Benford's Law and its Generalization

Alex E. Kossovsky,Wayne M. Lawton
2023-08-11
Abstract:We explain Kossovsky's generalization of Benford's law which is a formula that approximates the distribution of leftmost digits in finite sequences of natural data and apply it to six sequences of data including populations of US cities and towns and times between earthquakes. We model the natural logarithms of these two data sequences as samples of random variables having normal and reflected Gumbel densities respectively. We show that compliance with the general law depends on how nearly constant the periodized density functions are and that the models are generally more compliant than the natural data. This surprising result suggests that the generalized law might be used to improve density estimation which is the basis of statistical pattern recognition, machine learning and data science.
Methodology,Probability,Statistics Theory,Data Analysis, Statistics and Probability,Applications
What problem does this paper attempt to address?
The paper primarily explores Benford's Law and its generalized forms, and verifies the applicability of these laws in different natural datasets through theoretical analysis and empirical research. Specifically, the paper addresses the following key issues: 1. **Generalizing Benford's Law**: The authors propose a generalized form of Benford's Law (General Law of Relative Quantities, GLORQ), which not only applies to the distribution of the first digit but can also be used in a broader range of scenarios. 2. **Theoretical Foundation**: The paper details how to derive the generalized Benford's Law from a mathematical perspective and provides the corresponding formula (Formula 7). Additionally, it discusses the conditions for the validity of this law, particularly in relation to the range of variation of the data sequence (defined by \(R_{0.01}(s)\)). 3. **Empirical Analysis**: The paper selects various types of natural datasets (such as city populations, company market values, earthquake inter-event times, etc.) and verifies the validity of the generalized Benford's Law by calculating the goodness of fit between these datasets and the generalized law. 4. **Statistical Modeling**: To further validate the effectiveness of the generalized Benford's Law, the paper conducts statistical modeling on two datasets. For city population data, a log-normal distribution model is used; for earthquake inter-event time data, a reflected Gumbel distribution model is employed. By estimating the parameters of the models and conducting hypothesis tests (such as the Kolmogorov-Smirnov test), the fit between these models and the actual data is evaluated. 5. **Theoretical Extension**: The paper also discusses the conditions under which random variables satisfy the generalized Benford's Law and introduces some mathematical tools (such as Fourier transforms) to analyze these conditions. In summary, this paper aims to explore and verify the applicability and effectiveness of the generalized Benford's Law in different types of natural datasets through theoretical analysis and empirical research.