Universal Complex Structures in Written Language

Alvaro Corral,Ramon Ferrer-i-Cancho,Gemma Boleda,Albert Diaz-Guilera
DOI: https://doi.org/10.48550/arXiv.0901.2924
2009-01-20
Abstract:Quantitative linguistics has provided us with a number of empirical laws that characterise the evolution of languages and competition amongst them. In terms of language usage, one of the most influential results is Zipf's law of word frequencies. Zipf's law appears to be universal, and may not even be unique to human language. However, there is ongoing controversy over whether Zipf's law is a good indicator of complexity. Here we present an alternative approach that puts Zipf's law in the context of critical phenomena (the cornerstone of complexity in physics) and establishes the presence of a large scale "attraction" between successive repetitions of words. Moreover, this phenomenon is scale-invariant and universal -- the pattern is independent of word frequency and is observed in texts by different authors and written in different languages. There is evidence, however, that the shape of the scaling relation changes for words that play a key role in the text, implying the existence of different "universality classes" in the repetition of words. These behaviours exhibit striking parallels with complex catastrophic phenomena.
Physics and Society,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to explore the complex structures in written languages, especially whether the distance distribution between the repeated occurrences of words follows some universal pattern. Specifically, the authors focus on: 1. **Distance distribution of word repetitions**: Study the interval distances of word repetitions in the text (i.e., the number of words between two consecutive occurrences of the same word), and whether they conform to certain statistical laws, especially whether these laws are universal, that is, not affected by specific authors or languages. 2. **Relationship with Zipf's law**: Explore how these findings are related to known linguistic laws, such as Zipf's law (which describes the relationship between word frequency and its rank, with the formula \( f_w \propto 1/r_w^\alpha \)). Although Zipf's law describes the static characteristics of language, the authors hope to further understand the dynamic characteristics of language generation. 3. **Behavioral differences of different types of words**: Study whether there are differences in the behavior of different types of words (such as verbs, adjectives, nouns, and pronouns) when they are repeated, and whether these differences can be classified into different "universal categories". 4. **Similarity with natural phenomena**: Explore whether the pattern of word repetitions is similar to the pattern of time - interval distributions of natural phenomena such as earthquakes, thereby revealing the possible common mechanisms between human behavior and natural phenomena. Through the exploration of these issues, the authors aim to gain a deeper understanding of the complexity and dynamic characteristics of language, as well as the potential mechanisms behind these characteristics.