Abstract:This thesis presents a computational theory of unsupervised language acquisition, precisely defining procedures for learning language from ordinary spoken or written utterances, with no explicit help from a teacher. The theory is based heavily on concepts borrowed from machine learning and statistical estimation. In particular, learning takes place by fitting a stochastic, generative model of language to the evidence. Much of the thesis is devoted to explaining conditions that must hold for this general learning strategy to arrive at linguistically desirable grammars. The thesis introduces a variety of technical innovations, among them a common representation for evidence and grammars, and a learning strategy that separates the ``content'' of linguistic parameters from their representation. Algorithms based on it suffer from few of the search problems that have plagued other computational approaches to language acquisition. The theory has been tested on problems of learning vocabularies and grammars from unsegmented text and continuous speech, and mappings between sound and representations of meaning. It performs extremely well on various objective criteria, acquiring knowledge that causes it to assign almost exactly the same structure to utterances as humans do. This work has application to data compression, language modeling, speech recognition, machine translation, information retrieval, and other tasks that rely on either structural or stochastic descriptions of language.

Unsupervised Spoken Term Discovery on Untranscribed Speech

Unsupervised Pattern Discovery from Thematic Speech Archives Based on Multilingual Bottleneck Features

Unsupervised Spoken Term Discovery Based on Re-clustering of Hypothesized Speech Segments with Siamese and Triplet Networks

Weakly supervised spoken term discovery using cross-lingual side information

Unsupervised Discovery of Structured Acoustic Tokens with Applications to Spoken Term Detection

Spoken-Term Discovery using Discrete Speech Units

Unsupervised two-stage keyword extraction from spoken documents by topic coherence and support vector machine

Exploiting Cross-Lingual Knowledge in Unsupervised Acoustic Modeling for Low-Resource Languages

Towards Unsupervised Semantic Retrieval Of Spoken Content With Query Expansion Based On Automatically Discovered Acoustic Patterns

Towards Unsupervised Speech Recognition Without Pronunciation Models

Word Discovery in Visually Grounded, Self-Supervised Speech Models

From Semi-supervised to Almost-unsupervised Speech Recognition with Very-low Resource by Jointly Learning Phonetic Structures from Audio and Text Embeddings.

Towards Unsupervised Automatic Speech Recognition Trained by Unaligned Speech and Text only

Unsupervised Discovery of Recurring Speech Patterns Using Probabilistic Adaptive Metrics

Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents

Segmental Audio Word2Vec: Representing Utterances as Sequences of Vectors with Applications in Spoken Term Detection

Unsupervised acoustic unit discovery for speech synthesis using discrete latent-variable neural networks

Spoken Term Detection from Bilingual Spontaneous Speech Using Code-Switched Lattice-Based Structures for Words and Subword Units.

On the Unsupervised Analysis of Domain-Specific Chinese Texts

Unsupervised Language Acquisition

Unsupervised Word Discovery: Boundary Detection with Clustering vs. Dynamic Programming