Proper Interpretation of Heaps' and Zipf's Laws

Kim Chol-jun
2024-05-26
Abstract:We checked that the distribution of words in text should uniform, which gives Heaps' law as natural result, that is, the number of types of words can be expressed as a power law of the number of tokens within text. We developed a ``superposition'' model, which leads to an asymptotic power-law distribution of the number of occurrences (or frequency) of words, that is, Zipf's law. The model is well consistent with observations.
Physics and Society
What problem does this paper attempt to address?