Unsupervised embedding of trajectories captures the latent structure of scientific migration

Dakota Murray,Jisung Yoon,Sadamori Kojaku,Rodrigo Costas,Woo-Sung Jung,Staša Milojević,Yong-Yeol Ahn
DOI: https://doi.org/10.1073/pnas.2305414120
2023-11-18
Abstract:Human migration and mobility drives major societal phenomena including epidemics, economies, innovation, and the diffusion of ideas. Although human mobility and migration have been heavily constrained by geographic distance throughout the history, advances and globalization are making other factors such as language and culture increasingly more important. Advances in neural embedding models, originally designed for natural language, provide an opportunity to tame this complexity and open new avenues for the study of migration. Here, we demonstrate the ability of the model word2vec to encode nuanced relationships between discrete locations from migration trajectories, producing an accurate, dense, continuous, and meaningful vector-space representation. The resulting representation provides a functional distance between locations, as well as a digital double that can be distributed, re-used, and itself interrogated to understand the many dimensions of migration. We show that the unique power of word2vec to encode migration patterns stems from its mathematical equivalence with the gravity model of mobility. Focusing on the case of scientific migration, we apply word2vec to a database of three million migration trajectories of scientists derived from the affiliations listed on their publication records. Using techniques that leverage its semantic structure, we demonstrate that embeddings can learn the rich structure that underpins scientific migration, such as cultural, linguistic, and prestige relationships at multiple levels of granularity. Our results provide a theoretical foundation and methodological framework for using neural embeddings to represent and understand migration both within and beyond science.
Machine Learning,Physics and Society
What problem does this paper attempt to address?
The paper aims to address the issue of functional distance in the study of human migration and mobility, and proposes a new method to understand and represent this complex phenomenon. Specifically: 1. **The issue of functional distance**: - Traditional geographic distance can no longer fully explain modern human migration patterns, as factors such as language, culture, and economic opportunities have become increasingly important. - The paper attempts to capture these multidimensional factors by introducing a new representation method to better understand migration patterns. 2. **Application of the word2vec model**: - The paper utilizes the word2vec model (originally used in natural language processing) to embed migration trajectories, generating a high-dimensional vector space representation. - This representation method can capture the functional distance between geographic locations, as well as relationships in terms of culture, language, and prestige. 3. **Study of scientific migration**: - The paper specifically focuses on scientific migration, using migration trajectory data from over 3 million scientists to validate the effectiveness of its method. - The results show that word2vec embedding distance more accurately describes actual migration flows compared to geographic distance. 4. **Theoretical foundation and methodological framework**: - The paper also establishes the mathematical equivalence between word2vec and the gravity model, providing a theoretical foundation for this method. - This approach is not only applicable to the field of scientific research but can also be applied to migration and mobility studies in other diverse fields. In summary, this paper proposes a new methodological framework that uses neural embedding techniques to represent and understand the multidimensional relationships in human migration, with significant application value particularly in the field of scientific migration.