Hubs in Languages: Scale Free Networks of Synonyms

Hanna E. Makaruk,Robert Owczarek
DOI: https://doi.org/10.48550/arXiv.0802.4112
2008-02-28
Physics and Society
Abstract:Natural languages are described in this paper in terms of networks of synonyms: a word is identified with a node, and synonyms are connected by undirected links. Our statistical analysis of the network of synonyms in Polish language showed it is scale-free; similar to what is known for English. The statistical properties of the networks are also similar. Thus, the statistical aspects of the networks are good candidates for culture independent elements of human language. We hypothesize that optimization for robustness and efficiency is responsible for this universality. Despite the statistical similarity, there is no one-to-one mapping between networks of these two languages. Although many hubs in Polish are translated into similarly highly connected hubs in English, there are also hubs specific to one of these languages only: a single word in one language is equivalent to many different and disconnected words in the other, in accordance with the Whorf hypothesis about language relativity. Identifying language-specific hubs is vitally important for automatic translation, and for understanding contextual, culturally related messages that are frequently missed or twisted in a naive, literary translation.
What problem does this paper attempt to address?