Metrical-accent Aware Vocal Onset Detection in Polyphonic Audio

Georgi Dzhambazov,Andre Holzapfel,Ajay Srinivasamurthy,Xavier Serra
DOI: https://doi.org/10.48550/arXiv.1707.06163
2017-07-19
Abstract:The goal of this study is the automatic detection of onsets of the singing voice in polyphonic audio recordings. Starting with a hypothesis that the knowledge of the current position in a metrical cycle (i.e. metrical accent) can improve the accuracy of vocal note onset detection, we propose a novel probabilistic model to jointly track beats and vocal note onsets. The proposed model extends a state of the art model for beat and meter tracking, in which a-priori probability of a note at a specific metrical accent interacts with the probability of observing a vocal note onset. We carry out an evaluation on a varied collection of multi-instrument datasets from two music traditions (English popular music and Turkish makam) with different types of metrical cycles and singing styles. Results confirm that the proposed model reasonably improves vocal note onset detection accuracy compared to a baseline model that does not take metrical position into account.
Sound,Computation and Language,Multimedia
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to automatically detect the starting points of singing voices (i.e., the starting times of notes) in polyphonic audio recordings. Specifically, the paper proposes a new probabilistic model, aiming to improve the detection accuracy of the starting points of singing voices by combining beat tracking and note onset detection. This model pays special attention to the position in the current metrical cycle (i.e., metrical accent), because the research hypothesizes that this knowledge can improve the detection accuracy of note starting points. By evaluating on datasets of different musical traditions (such as English pop music and Turkish makam music), the paper verifies that the proposed model can reasonably improve the accuracy of note onset detection compared to the baseline model that does not consider metrical position.