Abstract:It turns out that some empirical facts in Big Data are the effects of properties of large numbers. Zipf's law 'noise' is an example of such an artefact. We expose several properties of the power law distributions and of similar distribution that occur when the population is finite and the rank and counts of elements in the population are natural numbers. We are particularly concerned with the low-rank end of the graph of the law, the potential of noise in the law, and with the approximation of the number of types of objects at various ranks. Approximations instead of exact solutions are the center of attention. Consequences in the interpretation of Zipf's law are discussed.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that some empirical facts in big data can be interpreted as the result of a large number of numerical properties, especially the "noise" phenomenon in Zipf's law. The author explores several properties of power - law distributions and their similar distributions when the population is finite and the rank and number of elements are natural numbers. The paper focuses particularly on the low - ranking end of the power - law graph, the potential noise in the law, and the approximation of the number of object types at different ranks. The author points out that the approximate solution rather than the exact solution is the focus of the study and discusses the impact of these findings on the understanding of Zipf's law. Specifically, by analyzing the characteristics of power - law distributions (such as Zipf's law) when dealing with discrete variables, the paper explores the following aspects: 1. **Properties of power - law distributions**: In particular, the characteristics exhibited by power - law distributions when dealing with natural number rankings and counts. 2. **Noise at the low - ranking end**: It explores why noise occurs at the low - ranking end of the power - law graph and whether this noise really reflects the uncertainty of the data. 3. **Approximation of the number of object types**: How to estimate the number of object types at different ranks. 4. **The impact of merging two power - law distribution populations**: It studies the changes in the Zipf's law graph when merging two power - law distribution populations with the same or different exponents. 5. **Noise in power - law distributions**: It discusses how to introduce noise in power - law distributions and the impact of this noise on rankings and counts. 6. **Hapax Legomena and related indicators**: It explores the significance and limitations of Hapax Legomena (words that appear only once) and Honoré, Sichel and other indicators in text analysis. Overall, the paper aims to gain a deep understanding of the performance of power - law distributions in big data through mathematical and statistical methods, especially the effectiveness and limitations of Zipf's law in practical applications.

Big Data and Large Numbers. Interpreting Zipf's Law

Zipf'S Law Leads to Heaps' Law: Analyzing Their Relation in Finite-Size Systems

Universal emergence of local Zipf's law

Dynamical approach to Zipf's law

Zipf's Law for Atlas Models

Zipf's and Taylor's Laws

Maximal Diversity and Zipf's Law

Large-Scale Analysis of Zipf’s Law in English Texts

Zipf's law, power laws, and maximum entropy

The Mathematical Relationship Between Zipf'S Law And The Hierarchical Scaling Law

Zipf's Law for Cities: An Explanation

Zipf's law, 1/f noise, and fractal hierarchy

The common patterns of abundance: the log series and Zipf's law

Zipf's Law for All the Natural Cities around the World

Log-log Convexity of Type-Token Growth in Zipf's Systems

A sensible proof connecting the scale-free feature with the Zipf-law

On the emergence of Zipf's law in music

Relating Zipf's law to textual information

Universality of Zipf's Law

Zipf's Law for All the Natural Cities in the United States: A Geospatial Perspective

The Rank-Size Scaling Law and Entropy-Maximizing Principle