Dependency Distance As A Metric of Language Comprehension Difficulty

Haitao Liu
DOI: https://doi.org/10.17791/jcs.2008.9.2.159
2008-01-01
Abstract:Linguistic complexity is a measure of the cognitive difficulty of human language processing. The present paper proposes dependency distance, in the framework of dependency grammar, as an insightful metric of complexity. Three hypotheses are formulated: (1) The human language parser prefers linear orders that minimize the average dependency distance of the recognized sentence (2) There is a threshold that the average dependency distance of most sentences or texts of human languages does not exceed (3) Grammar and cognition combine to keep dependency distance within the threshold. Twenty corpora from different languages with dependency syntactic annotation are used to test these hypotheses. The paper reports the average dependency distance in these corpora and analyzes the factors which influence dependency distance. The findings - that average dependency distance has a tendency to be minimized in human language and that there is a threshold of less than 3 words in average dependency distance and grammar plays an important role in constraining distance - support all three hypotheses, although some questions are still open for further research.
What problem does this paper attempt to address?