Rank-frequency distribution of natural languages: a difference of probabilities approach

Germinal Cocho,R. F. Rodríguez,Sergio Sánchez,Jorge Flores,Carlos Pineda,Carlos Gershenson
DOI: https://doi.org/10.1016/j.physa.2019.121795
2018-11-23
Abstract:The time variation of the rank $k$ of words for six Indo-European languages is obtained using data from Google Books. For low ranks the distinct languages behave differently, maybe due to syntaxis rules, whereas for $k>50$ the law of large numbers predominates. The dynamics of $k$ is described stochastically through a master equation governing the time evolution of its probability density, which is approximated by a Fokker-Planck equation that is solved analytically. The difference between the data and the asymptotic solution is identified with the transient solution, and good agreement is obtained.
Physics and Society,Applications
What problem does this paper attempt to address?