HarmoF0: Logarithmic Scale Dilated Convolution for Pitch Estimation

Weixing Wei,Peilin Li,Yi Yu,Wei Li
DOI: https://doi.org/10.1109/icme52920.2022.9858935
2022-01-01
Abstract:Sounds, especially music, contain various harmonic components scattered in the frequency dimension. It is difficult for normal convolutional neural networks to ob-serve these overtones. This paper introduces a multiple rates dilated causal convolution (MRDC-Conv) method to capture the harmonic structure in logarithmic scale spectrograms efficiently. The harmonic is helpful for pitch estimation, which is important for many sound processing applications. We propose HarmoF0, a fully convolutional network, to evaluate the MRDC-Conv and other dilated convolutions in pitch estimation. The re-sults show that this model outperforms the DeepF0, yields state-of-the-art performance in three datasets, and simultaneously reduces more than 90% parameters. We also find that it has stronger noise resistance and fewer octave errors.
What problem does this paper attempt to address?