SIFT-based local spectrogram image descriptor: a novel feature for robust music identification

Xiu Zhang,Bilei Zhu,Linwei Li,Wei Li,Xiaoqiang Li,Wei Wang,Peizhong Lu,Wenqiang Zhang
DOI: https://doi.org/10.1186/s13636-015-0050-0
2015-01-01
Abstract:Music identification via audio fingerprinting has been an active research field in recent years. In the real-world environment, music queries are often deformed by various interferences which typically include signal distortions and time-frequency misalignments caused by time stretching, pitch shifting, etc. Therefore, robustness plays a crucial role in music identification technique. In this paper, we propose to use scale invariant feature transform (SIFT) local descriptors computed from a spectrogram image as sub-fingerprints for music identification. Experiments show that these sub-fingerprints exhibit strong robustness against serious time stretching and pitch shifting simultaneously. In addition, a locality sensitive hashing (LSH)-based nearest sub-fingerprint retrieval method and a matching determination mechanism are applied for robust sub-fingerprint matching, which makes the identification efficient and precise. Finally, as an auxiliary function, we demonstrate that by comparing the time-frequency locations of corresponding SIFT keypoints, the factor of time stretching and pitch shifting that music queries might have experienced can be accurately estimated.
What problem does this paper attempt to address?