Abstract:Zipf's law is a paradigm describing the importance of different elements in communication systems, especially in linguistics. Despite the complexity of the hierarchical structure of language, music has in some sense an even more complex structure, due to its multidimensional character (melody, harmony, rhythm, timbre...). Thus, the relevance of Zipf's law in music is still an open question. Using discrete codewords representing harmonic content obtained from a large-scale analysis of classical composers, we show that a nearly universal Zipf-like law holds at a qualitative level. However, in an in-depth quantitative analysis, where we introduce the double power-law distribution as a new player in the classical debate between the superiority of Zipf's (power) law and that of the lognormal distribution, we conclude not only that universality does not hold, but that there is not a unique probability distribution that best describes the usage of the different codewords by each composer.
What problem does this paper attempt to address?
This paper attempts to explore the regularity of frequency distribution in music, especially whether the frequency distribution of harmonic codewords in classical music follows Zipf's law similar to that in linguistics. Specifically, by analyzing a large number of MIDI files of classical music works, the author extracts discrete codewords representing harmonic content and studies the statistical characteristics of the occurrence frequencies of these codewords. The paper not only examines the simple power - law distribution, but also introduces the double power - law distribution and the lognormal distribution to better describe the data.
### Main research questions:
1. **Does the frequency distribution in music follow Zipf's law?** The paper explores whether the frequency distribution of harmonic codewords in classical music conforms to Zipf's law, that is, whether the frequency distribution exhibits power - law characteristics.
2. **Is there universality in the frequency distribution among different composers?** Researchers analyze the works of different composers and try to find out the universal laws or differences in the frequency distribution.
3. **Can the double power - law distribution better describe the frequency distribution in music?** The paper introduces the double power - law distribution and explores its superiority in describing the music frequency distribution.
### Research methods:
- **Data sources**: Use 17,419 MIDI files in the Kunstderfuge database, covering 76 composers from the 12th century to the 20th century.
- **Data processing**: Pre - process the MIDI files, extract the harmonic codewords in each time interval, and convert them into 12 - dimensional binary vectors.
- **Statistical analysis**: Use maximum likelihood estimation and Kolmogorov - Smirnov test to fit the power - law distribution, double power - law distribution and lognormal distribution, and evaluate the quality of the fit.
### Main findings:
- **Limitations of the simple power - law distribution**: For most composers, the simple power - law distribution cannot well describe the data, especially in the low - frequency part.
- **Advantages of the double power - law distribution**: The double power - law distribution provides a better fit in most cases and can cover a wider range of data.
- **Diversity of frequency distribution**: The works of different composers show different frequency distribution characteristics, and no single probability distribution can be generally applicable to all composers.
### Conclusion:
Through large - scale data analysis, the paper reveals the complexity of the frequency distribution of harmonic codewords in classical music. Although at the qualitative level, the frequency distribution exhibits characteristics similar to Zipf's law, in the quantitative analysis, the double power - law distribution can more accurately describe the data. In addition, there are significant differences in the frequency distribution of the works of different composers, indicating that the structure of music is more complex and diverse than that of language.