Abstract:Slavic languages are generally assumed to possess rich morphological features with free syntactic word order. Exploring this complexity trade-off can help us better understand the relationship between morphology and syntax within natural languages. However, few quantitative investigations have been carried out into this relationship within Slavic languages. Based on 34 annotated corpora from Universal Dependencies, this paper paid special attention to the correlations between morphology and syntax within Slavic languages by applying two metrics of morphological richness and two of word order freedom, respectively. Our findings are as follows. First, the quantitative metrics adopted can well capture the distributions of morphological richness and word order freedom of languages. Second, the metrics can corroborate the correlation between morphological richness and word order freedom. Within Slavic languages, this correlation is moderate and statistically significant. Precisely, the richer the morphology, the less strict the word order. Third, Slavic languages can be clustered into three subgroups based on classification models. Most importantly, ancient Slavic languages are characterized by richer morphology and more flexible word order than modern ones. Fourth, as two possible disturbing factors, corpus size does not greatly affect the results of the metrics, whereas corpus genre does play an important part in the measurements of word order freedom. Specifically, the word order of formal written genres tends to be more rigid than that of informal written and spoken ones. Overall, based on annotated corpora, the results verify the negative correlation between morphological richness and word order rigidity within Slavic languages, which might shed light on the dynamic relations between morphology and syntax of natural languages and provide quantitative instantiations of how languages encode lexical and syntactic information for the purpose of efficient communication.

Lexical Diversity As a Lens into the Classification of Slavic Languages: A Quantitative Typology Perspective.

Morphology and Word Order in Slavic Languages: Insights from Annotated Corpora

Quantitative Typological Analysis of Romance Languages

Classifying Syntactic Regularities for Hundreds of Languages

Language Clustering with Word Co-Occurrence Networks Based on Parallel Texts

Lexical Category Bias Across Interpreting Types: Implications for Synergy Between Cognitive Constraints and Language Representations

What is "Typological Diversity" in NLP?

Language clusters based on linguistic complex networks

Lexical Diversity in Kinship Across Languages and Dialects

Lexical diversity as a predictor of complexity in textbooks on the Russian language

Using Linguistic Typology to Enrich Multilingual Lexicons: the Case of Lexical Gaps in Kinship

Typological Features of Zhuang from the Perspective of Word Frequency Distribution.

A Measure for Transparent Comparison of Linguistic Diversity in Multilingual NLP Data Sets

Interlanguage: a Perspective of Quantitative Linguistic Typology

A Probabilistic Generative Model of Linguistic Typology

Patterns of Persistence and Diffusibility across the World's Languages

On the relation between structural diversity and geographical distance among languages: observations and computer simulations

Reconstructing Native Language Typology from Foreign Language Usage

From Phonology to Syntax: Unsupervised Linguistic Typology at Different Levels with Language Embeddings

Linguistic correlates of societal variation: A quantitative analysis

Predicting language diversity with complex network