Multilingual acoustic word embedding models for processing zero-resource languages

Herman Kamper,Yevgen Matusevych,Sharon Goldwater

DOI: https://doi.org/10.48550/arXiv.2002.02109

2020-02-21

Abstract:Acoustic word embeddings are fixed-dimensional representations of variable-length speech segments. In settings where unlabelled speech is the only available resource, such embeddings can be used in "zero-resource" speech search, indexing and discovery systems. Here we propose to train a single supervised embedding model on labelled data from multiple well-resourced languages and then apply it to unseen zero-resource languages. For this transfer learning approach, we consider two multilingual recurrent neural network models: a discriminative classifier trained on the joint vocabularies of all training languages, and a correspondence autoencoder trained to reconstruct word pairs. We test these using a word discrimination task on six target zero-resource languages. When trained on seven well-resourced languages, both models perform similarly and outperform unsupervised models trained on the zero-resource languages. With just a single training language, the second model works better, but performance depends more on the particular training--testing language pair.

Computation and Language,Audio and Speech Processing

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively perform speech processing in zero - resource languages without annotated data, especially how to generate high - quality acoustic word embeddings. Specifically, the researchers proposed a method of using multilingual supervised models to train acoustic word embeddings. These models can be trained on languages with abundant annotated data and then applied to zero - resource languages without annotated data. This method aims to overcome the challenge of difficultly collecting a large amount of annotated data in low - resource languages and at the same time improve the performance of zero - resource language processing tasks, such as example - based speech search, indexing and discovery systems. Two multilingual recurrent neural network models are proposed and tested in the paper: one is a discriminative classifier, and the other is a corresponding auto - encoder. Both of these two models are trained on the annotated data of multiple resource - rich languages and then applied to unseen zero - resource languages. The experimental results show that when trained with seven resource - rich languages, the performance of these two models is similar and better than that of the unsupervised model trained only on zero - resource languages. In addition, when trained with only one resource - rich language, the corresponding auto - encoder model performs better, but its performance is more dependent on specific training - testing language pairs.

Multilingual acoustic word embedding models for processing zero-resource languages

Improved acoustic word embeddings for zero-resource languages using multilingual transfer

Multilingual acoustic word embeddings for zero-resource languages

Multilingual Neural Machine Translation for Zero-Resource Languages

Cross lingual transfer learning for zero-resource domain adaptation

Leveraging multilingual transfer for unsupervised semantic acoustic word embeddings

Zero-Resource Multilingual Model Transfer: Learning What to Share

Exploiting Cross-Lingual Knowledge in Unsupervised Acoustic Modeling for Low-Resource Languages

Learning Contextualised Cross-lingual Word Embeddings and Alignments for Extremely Low-Resource Languages Using Parallel Corpora

Learning Cross-lingual Visual Speech Representations

Cross-Entropy Training of DNN Ensemble Acoustic Models for Low-Resource ASR

Learning Disentangled Semantic Representations for Zero-Shot Cross-Lingual Transfer in Multilingual Machine Reading Comprehension

Adapting Word Embeddings to New Languages with Morphological and Phonological Subword Representations

Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP

Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR

Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition

Regularization Advantages of Multilingual Neural Language Models for Low Resource Domains

Vision as an Interlingua: Learning Multilingual Semantic Embeddings of Untranscribed Speech

Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages

Establishing degrees of closeness between audio recordings along different dimensions using large-scale cross-lingual models

Cross-Lingual Transfer Learning for Speech Translation