Abstract:Language spreading is a complex mechanism that involves issues like culture, economics, migration, population etc. In this paper, we propose a set of methods to model the dynamics of the spreading system. To model the randomness of language spreading, we propose the Batch Markov Monte Carlo Simulation with Migration(BMMCSM) algorithm, in which each agent is treated as a language stack. The agent learns languages and migrates based on the proposed Batch Markov Property according to the transition matrix T and migration matrix M. Since population plays a crucial role in language spreading, we also introduce the Mortality and Fertility Mechanism, which controls the birth and death of the simulated agents, into the BMMCSM algorithm. The simulation results of BMMCSM show that the numerical and geographic distribution of languages varies across the time. The change of distribution fits the world cultural and economic development trend. Next, when we construct Matrix T, there are some entries of T can be directly calculated from historical statistics while some entries of T is unknown. Thus, the key to the success of the BMMCSM lies in the accurate estimation of transition matrix T by estimating the unknown entries of T under the supervision of the known entries. To achieve this, we first construct a 20 by 20 by 5 factor tensor X to characterize each entry of T. Then we train a Random Forest Regressor on the known entries of T and use the trained regressor to predict the unknown entries. The reason why we choose Random Forest(RF) is that, compared to Single Decision Tree, it conquers the problem of over fitting and the Shapiro test also suggests that the residual of RF subjects to the Normal distribution.

What problem does this paper attempt to address?

This paper attempts to solve the problem of complex mechanisms in language spread, including the influence of factors such as culture, economy, immigration and population on language distribution. Specifically, the author proposes an algorithm based on Batch Markov Monte Carlo Simulation with Migration (BMMCSM) to model the dynamic process of language spread. The following are the main research questions of this paper: 1. **Modeling the randomness of language spread**: - The author proposes a new Batch Markov Property (BMP), in which each agent is regarded as a language stack and learns and migrates according to the transition matrix \( T \) and the migration matrix \( M \). - The element \( t_{ij} \) in the transition matrix \( T=(t_{ij})_{N\times N} \) represents the probability that an agent will learn language \( l_j \) when it has mastered language \( l_i \): \[ t_{ij}=P\{\text{agent will learn } l_j | \text{agent has mastered } l_i\} \] 2. **The influence of population change on language spread**: - The Mortality and Fertility Mechanism (MFM) is introduced to simulate the birth and death of agents, so as to more accurately reflect the influence of population change on language spread. 3. **Introduction of migration patterns**: - The migration preference matrix \( M=(m_{ij})_{N\times N} \) is proposed, where \( m_{ij} \) represents the probability that an agent will migrate from language area \( i \) to language area \( j \): \[ m_{ij}=P\{\text{agent will migrate to language zone } j | \text{agent stays in language zone } i\} \] - The migration pattern depends not only on the current residence location of the agent, but also on the language area where its mother tongue is located. 4. **Estimation of the transition matrix \( T \)**: - In order to accurately estimate the unknown elements in the transition matrix \( T \), the author constructs a 20×20×5 factor tensor \( \vec{X} \) and uses the Random Forest Regressor for prediction. The factor tensor \( \vec{X} \) contains five factors: language similarity, net outflow of foreign direct investment (FDI NO), economic interaction between language areas, cultural soft power and migration preference. Through these methods, the paper aims to establish a model that can accurately predict future language distribution, taking into account multiple influencing factors, and verifies the effectiveness of the model through large - scale simulation experiments. The final results show that the numerical and geographical distribution of languages will change significantly in the next 50 years, reflecting the impact of globalization and technological progress on language spread.

Language Distribution Prediction based on Batch Markov Monte Carlo Simulation with Migration

Linguistic evolution driven by network heterogeneity and the Turing mechanism

Language dynamics model with finite-range interactions influencing the diffusion of linguistic traits and human dispersal

Where Would I Go Next? Large Language Models as Human Mobility Predictors

Low-latency MLLM Inference with Spatiotemporal Heterogeneous Distributed Multimodal Data

Language Change and Social Networks

Predicting language diversity with complex network

The Spatial Distribution of Clusters and the Formation of Mixed Languages in Bilingual Competition

A hybrid model for high spatial and temporal resolution population distribution prediction

A Hybrid Population Distribution Prediction Approach Integrating LSTM and CA Models with Micro-Spatiotemporal Granularity: A Case Study of Chongming District, Shanghai

Community Size and User Migration: Population Model Based on opinion Dynamics

A Hybrid Markov-based Model for Human Mobility Prediction.

AgentMove: Predicting Human Mobility Anywhere Using Large Language Model based Agentic Framework

Demography-based adaptive network model reproduces the spatial organization of human linguistic groups

Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation

A simple branching model that reproduces language family and language population distributions

Birth, survival and death of languages by Monte Carlo simulation

On Clustering Trend in Language Evolution Based on Dynamical Behaviors of Multi-Agent Model

Statistical Characteristics of Dynamics for Population Migration Driven by the Economic Interests

Modeling Population Mobility Flows: A Hybrid Approach Integrating a Gravity Model and Machine Learning

A machine learning-based generalized approach for predicting unauthorized immigration flow considering dynamic border security nexus