Abstract:The speech signal conveys information on different time scales from short (20–40 ms) time scale or segmental, associated to phonological and phonetic information to long (150–250 ms) time scale or supra segmental, associated to syllabic and prosodic information. Linguistic and neurocognitive studies recognize the phonological classes at segmental level as the essential and invariant representations used in speech temporal organization.In the context of speech processing, a deep neural network (DNN) is an effective computational method to infer the probability of individual phonological classes from a short segment of speech signal. A vector of all phonological class probabilities is referred to as phonological posterior. There are only very few classes comprising a short term speech signal; hence, the phonological posterior is a sparse vector. Although the phonological posteriors are estimated at segmental level, we claim that they convey supra-segmental information. Specifically, we demonstrate that phonological posteriors are indicative of syllabic and prosodic events.Building on findings from converging linguistic evidence on the gestural model of Articulatory Phonology as well as the neural basis of speech perception, we hypothesize that phonological posteriors convey properties of linguistic classes at multiple time scales, and this information is embedded in their support (index) of active coefficients. To verify this hypothesis, we obtain a binary representation of phonological posteriors at the segmental level which is referred to as first-order sparsity structure; the high-order structures are obtained by the concatenation of first-order binary vectors. It is then confirmed that the classification of supra-segmental linguistic events, the problem known as linguistic parsing, can be achieved with high accuracy using a simple binary pattern matching of first-order or high-order structures.

Sparse, complex-valued representations of natural sounds learned with phase and amplitude continuity priors

A Variational Bayesian Approximation Approach Via A Sparsity Enforcing Prior In Acoustic Imaging

An Efficient Variational Bayesian Inference Approach Via Studient's-t Priors for Acoustic Imaging in Colored Noises

Adaptive Speech Enhancement Using Sparse Prior Information.

On structured sparsity of phonological posteriors for linguistic parsing

A hierarchical sparse coding model predicts acoustic feature encoding in both auditory midbrain and cortex

Learning Midlevel Auditory Codes from Natural Sound Statistics

Phase-Optimized K-SVD for Signal Extraction from Underdetermined Multichannel Sparse Mixtures

An Improved Sparse Reconstruction Algorithm for Speech Compressive Sensing Using Structured Priors.

Sparse reconstruction of sound field using pattern-coupled Bayesian compressive sensing

Sparse coding for sound event classification

Stage-Wise and Prior-Aware Neural Speech Phase Prediction

Hierarchical Sparse Coding from a Bayesian Perspective.

Study on Sparse Coding Speech Enhancement

Sparse Nonnegative Matrix Factorization Strategy for Cochlear Implants

Log Complex Color for Visual Pattern Recognition of Total Sound

Low-rankness of Complex-valued Spectrogram and Its Application to Phase-aware Audio Processing

Generative adversarial networks with physical sound field priors

Supervised Monaural Speech Enhancement Using Complementary Joint Sparse Representations.

Compressive Phase Retrieval: Optimal Sample Complexity with Deep Generative Priors

Physics-inspired Neuroacoustic Computing Based on Tunable Nonlinear Multiple-scattering