Abstract:Speech separation is the task of separating target speech from background interference. Traditionally, speech separation is studied as a signal processing problem. A more recent approach formulates speech separation as a supervised learning problem, where the discriminative patterns of speech, speakers, and background noise are learned from training data. Over the past decade, many supervised separation algorithms have been put forward. In particular, the recent introduction of deep learning to supervised speech separation has dramatically accelerated progress and boosted separation performance. This paper provides a comprehensive overview of the research on deep learning based supervised speech separation in the last several years. We first introduce the background of speech separation and the formulation of supervised separation. Then, we discuss three main components of supervised separation: learning machines, training targets, and acoustic features. Much of the overview is on separation algorithms where we review monaural methods, including speech enhancement speech-nonspeech separation, speaker separation multitalker separation, and speech dereverberation, as well as multimicrophone techniques. The important issue of generalization, unique to supervised learning, is discussed. This overview provides a historical perspective on how advances are made. In addition, we discuss a number of conceptual issues, including what constitutes the target source.

Research on the Distal Supervised Learning Model of Speech Inversion.

Unsupervised Acoustic-to-Articulatory Inversion with Variable Vocal Tract Anatomy

Speaker-Independent Acoustic-to-Articulatory Speech Inversion

Speaker-independent speech inversion for recovery of velopharyngeal port constriction degreea)

Improving Speech Inversion Through Self-Supervised Embeddings and Enhanced Tract Variables

Self-Supervised Models of Speech Infer Universal Articulatory Kinematics

Speaker-Independent Acoustic-to-Articulatory Inversion through Multi-Channel Attention Discriminator

Acoustic-to-articulatory inversion for dysarthric speech: Are pre-trained self-supervised representations favorable?

Probing self-supervised speech models for phonetic and phonemic information: a case study in aspiration

Auditory Model Inversion and Its Application

A deep recurrent approach for acoustic-to-articulatory inversion

Learning Model-Based F0 Production Through Goal-Directed Babbling

IMPROVING GENERALIZABILITY OF DISTILLED SELF-SUPERVISED SPEECH PROCESSING MODELS UNDER DISTORTED SETTINGS

Speaker-independent Speech Inversion for Estimation of Nasalance

Evaluation Of Linear Regression For Speaker Adaptation In Hmm-Based Articulatory Movements Estimation

Two-stage and Self-supervised Voice Conversion for Zero-Shot Dysarthric Speech Reconstruction

Two-Stream Joint-Training for Speaker Independent Acoustic-to-Articulatory Inversion

The Secret Source : Incorporating Source Features to Improve Acoustic-to-Articulatory Speech Inversion

Parameter inversion method of vocal fold dynamic model in pathological voice classification

Multi-Speaker Pitch Tracking Via Embodied Self-Supervised Learning

Supervised Speech Separation Based on Deep Learning: An Overview