Statistical Physics of Language Maps in the USA

James Burridge,Bert Vaux,Michal Gnacik,Yoana Grudeva
DOI: https://doi.org/10.1103/PhysRevE.99.032305
2018-11-21
Abstract:Spatial linguistic surveys often reveal well defined geographical zones where certain linguistic forms are dominant over their alternatives. It has been suggested that these patterns may be understood by analogy with coarsening in models of two dimensional physical systems. Here we investigate this connection by comparing data from the Cambridge Online Survey of World Englishes to the behaviour of a generalised zero temperature Potts model with long range interactions. The relative displacements of linguistically similar population centres reveals enhanced east-west affinity. Cluster analysis reveals three distinct linguistic zones. We find that when the interaction kernel is made anisotropic by stretching along the east-west axis, the model can reproduce the three linguistic zones for all interaction parameters tested. The model results are consistent with a view held by some linguists that, in the USA, language use is, or has been, exchanged or transmitted to a greater extent along the east-west axis than the north-south.
Physics and Society,Data Analysis, Statistics and Probability
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper attempts to explain the observed geographical distribution patterns in the American language map using methods from statistical physics. Specifically, the researchers aim to test the following hypotheses: 1. **Can the geographical boundaries of linguistic features be analogous to domain walls in physical systems**: The researchers believe that these domain walls will gradually straighten over time and be repelled by population centers, leading to a significant predictability in the geographical distribution of language use. 2. **Is the degree of communication or propagation of language use in the east-west direction higher than in the north-south direction in the United States**: The researchers propose that if anisotropy (i.e., stronger interactions in the east-west direction) is introduced into the model, can the model reproduce the three main linguistic regions observed in reality. To test these hypotheses, the researchers compared data from the Cambridge Online Survey of World Englishes (COSWE) with a generalized zero-temperature Potts model. This model considers long-range interactions and can simulate differences in language propagation in different directions by adjusting the anisotropy of the interaction kernel. ### Main Methods 1. **Data Collection and Processing**: - Using the COSWE dataset, which includes responses from approximately 58,000 geographical locations in the eastern United States. - Clustering survey responses using the Mean-Shift algorithm to generate 300 nodes, each representing a linguistic community. - Defining a language frequency vector for each node to describe language use for various questions at that node. 2. **Model Construction**: - Constructing a generalized zero-temperature Potts model, where the state of each node represents its linguistic cluster. - Defining the interaction kernel in the model as a truncated Cauchy distribution to simulate long-range interactions. - Introducing an anisotropy parameter A to adjust the interaction strength in the east-west direction. 3. **Result Analysis**: - Using the Metropolis algorithm to simulate the evolution of language states. - Comparing the simulation results with actual survey data using the K-means clustering method. - Analyzing the model results under different parameter settings to evaluate the robustness and accuracy of the model. ### Main Findings 1. **The model can reproduce actual linguistic regions**: - In the case of isotropy (A=1), the model can generate linguistic regions similar to the actual survey data under certain parameter settings. - When anisotropy (A=1.15) is introduced, the model can more stably reproduce actual linguistic regions, indicating that language propagation in the east-west direction is indeed more frequent. 2. **Impact of social conformity and population distribution**: - Social conformity (i.e., the tendency of people to conform to those around them) and uneven population distribution are important factors in explaining the geographical distribution of language. - Spatially adjacent nodes are more similar in language, but this similarity is influenced by social conformity and population distribution. 3. **Comparison with the Voronoi null model**: - A simple Voronoi null model, which only assumes that geographically adjacent nodes are similar in language, cannot reproduce actual linguistic regions. - The Potts model, by introducing social conformity and uneven population distribution, can better explain the actual geographical distribution of language. ### Conclusion This paper successfully explains the observed geographical distribution patterns in the American language map using methods from statistical physics. The research results support the hypothesis that the geographical boundaries of linguistic features can be analogous to domain walls in physical systems and confirm that the degree of communication or propagation of language use in the east-west direction is indeed higher than in the north-south direction in the United States. These findings provide new perspectives for understanding the geographical distribution of language.