Geospatial analysis of toponyms in geotagged social media posts

Takayuki Hiraoka,Takashi Kirimura,Naoya Fujiwara
2024-10-04
Abstract:Place names, or toponyms, play an integral role in human representation and communication of geographic space. In particular, how people relate each toponym with particular locations in geographic space should be indicative of their spatial perception. Here, we make use of an extensive dataset of georeferenced social media posts, retreived from Twitter, to perform a statistical analysis of the geographic distribution of toponyms and uncover the relationship between toponyms and geographic space. We show that the occurrence of toponyms is characterized by spatial inhomogeneity, giving rise to patterns that are distinct from the distribution of common nouns. Using simple models, we quantify the spatial specificity of toponym distributions and identify their core-periphery structures. In particular, we find that toponyms are used with a probability that decays as a power law with distance from the geographic center of their occurrence. Our findings highlight the potential of social media data to explore linguistic patterns in geographic space, paving the way for comprehensive analyses of human spatial representations.
Physics and Society
What problem does this paper attempt to address?
The paper attempts to address the issue of understanding the distribution patterns of toponyms in geographic space and their relationship with human spatial perception. Specifically, the authors utilize a large dataset of geotagged social media posts to explore the distribution characteristics of toponyms in geographic space through statistical analysis and reveal how these distributions reflect collective spatial cognition. ### Main Research Questions: 1. **Geographic Distribution Characteristics of Toponyms**: Investigate whether the distribution of toponyms in geographic space has specific patterns and how these patterns differ from the distribution of common nouns. 2. **Spatial Specificity of Toponym Usage**: Explore whether the usage of toponyms is related to the distance from their referred geographic center, i.e., whether the frequency of toponym usage decreases with increasing distance. 3. **Collective Spatial Cognition**: Understand collective-level spatial cognition and representation by analyzing the distribution of toponyms. ### Research Methods: - **Data Source**: Collected over 395 million geotagged tweets from the Twitter API, within the geographic scope of Japan. - **Data Processing**: Excluded posts made through Foursquare check-in services and automated bots, ultimately retaining 277 million posts. - **Analysis Methods**: - Statistical Analysis: Conducted statistical analysis on the geographic distribution of toponyms and common nouns to explore their distribution heterogeneity and patterns. - Model Construction: Proposed a "core-periphery" model, hypothesizing that the probability of toponyms appearing within a certain range is constant, and beyond that range, it decays following a power law with distance. ### Main Findings: 1. **Heterogeneity of Toponym Distribution**: The distribution of toponyms exhibits significant spatial heterogeneity, differing from the distribution of common nouns. 2. **Power Law Decay**: The frequency of toponym usage decays following a power law with increasing distance from the geographic center. 3. **Core-Periphery Structure**: The distribution of toponyms can be divided into core and peripheral regions, with higher usage frequency in the core region and gradually decreasing in the peripheral region. 4. **Model Validation**: The proposed "core-periphery" model effectively explains the bifurcated scaling behavior of toponym distribution, outperforming location-independent models. ### Conclusion: This study reveals the distribution patterns of toponyms in geographic space through big data analysis, emphasizing the relationship between toponym usage and distance from the geographic center. It provides a new perspective for understanding human collective spatial cognition. These findings not only contribute to research in linguistics and sociology but also offer valuable references for geographic information systems and urban planning.