A Real-time Audio-to-audio Karaoke Generation System for Monaural Recordings Based on Singing Voice Suppression and Key Conversion Techniques

Hideyuki Tachibana,Yu Mizuno,Nobutaka Ono,Shigeki Sagayama
DOI: https://doi.org/10.2197/ipsjjip.24.470
2016-01-01
Journal of Information Processing
Abstract:This paper describes an automatic karaoke generation system, which can suppress the singing voice in audio music signals, and can also change the pitch of the song. Furthermore, this system accepts the streaming input, and it works in real-time. To the best of our knowledge, there have been no real-time audio-to-audio karaoke system that has the two functions above. This paper particularly describes the two technical components, as well as some comments on the implementation. In this system, the authors employed two signal processing techniques: singing voice suppression that is based on two-stage HPSS, a vocal enhancement technique that the authors proposed previously, and a pitch shift technique that is based on the spectrogram stretch and phase vocoder. The attached video file shows that the system works in real-time, and the sound quality may be practically acceptable.
What problem does this paper attempt to address?