Source Separation of Piano Concertos Using Hybrid LSTM-Transformer Model

JingYu Liu,Wei He,Jingjing Zhou,Wei Jiang
DOI: https://doi.org/10.1109/cost64302.2024.00011
2024-01-01
Abstract:Music source separation, the process of extracting independent audio streams from a complex mix, has traditionally focused on isolating vocals, drums, bass, and other primary sources. This study tackles the more intricate task of separating the piano component from a piano concerto-a challenge compounded by the diverse range of instruments and the dynamic shifts in volume and timbre. Unlike traditional music separation tasks, the piano’s distinct characteristics and its interaction with the orchestra demand a more nuanced approach.To address the scarcity of multi-track recordings for piano concertos, this research pioneers an artificial data synthesis strategy to create a robust training dataset. We introduce a novel hybrid deep learning model that integrates Long Short-Term Memory (LSTM) networks with Transformer architecture, capitalizing on their complementary strengths to distinguish piano melodies from the rich tapestry of orchestral sounds. Our experiments demonstrate that this hybrid approach significantly outperforms conventional methods, with an improvement of 3.18 dB in signal-to-distortion ratio. These results not only validate the efficacy of proposed method but also pave the way for innovative applications in classical music source separation.
What problem does this paper attempt to address?