Abstract:Human language is a unique form of communication in the natural world, distinguished by its structured nature. Most fundamentally, it is systematic, meaning that signals can be broken down into component parts that are individually meaningful -- roughly, words -- which are combined in a regular way to form sentences. Furthermore, the way in which these parts are combined maintains a kind of locality: words are usually concatenated together, and they form contiguous phrases, keeping related parts of sentences close to each other. We address the challenge of understanding how these basic properties of language arise from broader principles of efficient communication under information processing constraints. Here we show that natural-language-like systematicity arises in codes that are constrained by predictive information, a measure of the amount of information that must be extracted from the past of a sequence in order to predict its future. In simulations, we show that such codes approximately factorize their source distributions, and then express the resulting factors systematically and locally. Next, in a series of cross-linguistic corpus studies, we show that human languages are structured to have low predictive information at the levels of phonology, morphology, syntax, and semantics. Our result suggests that human language performs a sequential, discrete form of Independent Components Analysis on the statistical distribution over meanings that need to be expressed. It establishes a link between the statistical and algebraic structure of human language, and reinforces the idea that the structure of human language is shaped by communication under cognitive constraints.

Information Flow in Pregroup Models of Natural Language

Rule-Based and Word-Level Statistics-Based Processing of Language: Insights from Neuroscience

Natural Language Parsing and Linguistic Theories

Linguistic Structure from a Bottleneck on Sequential Information Processing

Diagrammatic Negative Information

Natural Language: From Knowledge to Cognition

Preconditionals

Right-preordered groups from a categorical perspective

Theoretical Study of One-dimensional Chains of Metal Atoms in Nanotubes

The role of grammar in transition-probabilities of subsequent words in English text

Null-Prep as a systematic interlanguage phenomenon: Evidence from relative clauses, interrogatives, and sluicing constructions

The causal interaction between the subnetworks of a complex network

Putting Geometry and Function Together — Towards a Psychologically-Plausible Computational Model for Spatial Language Comprehension

Probing structural constraints of negation in Pretrained Language Models

Causal Graph in Language Model Rediscovers Cortical Hierarchy in Human Narrative Processing

Information Flow and Causality As Rigorous Notionsab Initio

Partial groups, pregroups and realisability of fusion systems

Explaining pretrained language models' understanding of linguistic structures using construction grammar

Information Flow Routes: Automatically Interpreting Language Models at Scale

Language Design as Information Renormalization

Modeling structure-building in the brain with CCG parsing and large language models