Consistent Spectrogram Separation from Nonstationary Mixture

Adrien Meynard,Ama Marina Kreme
2024-06-25
Abstract:We present a spectrogram separation method tailored for mixtures comprising two nonstationary components. By exploiting the unique characteristics of their time-frequency representations, we propose an inverse problem formulation to estimate the spectrograms of the components. We then introduce an alternating optimization algorithm that ensures the consistency of the estimated spectrograms. The efficacy of the algorithm is evaluated through testing on synthetic mixtures and is applied to a bioacoustic signal.
Signal Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is **to separate the spectrograms of two components from non - stationary mixed signals**. Specifically, the researchers focus on how to decompose a single - channel mixed signal containing two non - stationary components into their independent spectrogram representations. This problem is of great significance in the field of audio signal processing, especially in application scenarios such as sound event detection, speech enhancement, and single - channel audio source separation. ### Problem Background In many applications, spectrogram separation is a necessary step before performing the required tasks. For example, in tasks such as sound event detection and localization, speech enhancement, or single - channel audio source separation, the common method is to take the spectrogram of the mixed signal as input and generate a mask (mask) to be applied to the spectrogram of each source to be separated. Although these methods mainly focus on audio source separation, the first step is actually spectrogram separation. ### Specific Problem Description This paper specifically focuses on the spectrogram separation problem of mixed signals containing two non - stationary components. This separation can be regarded as the first stage of single - channel source separation, and single - channel source separation itself is a major challenge, especially in the case of underdetermined blind source separation, that is, when the number of sources exceeds the number of observations. To address this challenge, existing methods usually utilize the characteristics of the source signals as well as application requirements, including: - **Probability Models**: such as Gaussian Mixture Model (GMM), Hidden Markov Model (HMM), and Factorial Hidden Markov Model (factorial HMM). - **Spectral Decomposition - Based Methods**: such as Independent Subspace Analysis (ISA) and Non - negative Matrix Factorization (NMF). - **Computational Auditory Scene Analysis (CASA)**: simulates the human ear's separation of sound sources. - **Deep Neural Network (DNN) Methods**: effectively separates speech signals in the time - frequency domain. However, these methods each have limitations, such as requiring a large amount of training data or being difficult to separate musical instruments with similar tonal characteristics. Therefore, this paper proposes a new method to solve these problems. ### Proposed Solution This paper proposes a method based on inverse problem formulation, ensuring the consistency of the estimated spectrograms through an alternating optimization algorithm. Specifically, the author solves the problem through the following steps: 1. **Model Definition**: Defines two types of non - stationary signals - bumps signal and multi - component Amplitude - Modulation Frequency - Modulation (AM - FM) signal, and gives their mathematical expressions. 2. **Inverse Problem Construction**: Proposes an optimization problem to minimize the difference between the observed spectrogram and the estimated spectrogram, and introduces a regularization term to promote smoothness and sparsity. 3. **Alternating Optimization Algorithm**: By alternately updating the spectrograms of the two components, gradually approaches the optimal solution, and ensures that the estimated spectrograms satisfy the consistency constraint. 4. **Numerical Experiments**: Verifies the effectiveness of the proposed method through the application to synthetic signals and real - audio signals. Through this method, the author successfully separates the two components in the mixed signal and shows its practical application potential in bio - acoustic signal processing.