Joint Scattering for Automatic Chick Call Recognition

Changhong Wang,Emmanouil Benetos,Shuge Wang,Elisabetta Versace
DOI: https://doi.org/10.48550/arXiv.2110.03965
2021-10-08
Abstract:Animal vocalisations contain important information about health, emotional state, and behaviour, thus can be potentially used for animal welfare monitoring. Motivated by the spectro-temporal patterns of chick calls in the time$-$frequency domain, in this paper we propose an automatic system for chick call recognition using the joint time$-$frequency scattering transform (JTFS). Taking full-length recordings as input, the system first extracts chick call candidates by an onset detector and silence removal. After computing their JTFS features, a support vector machine classifier groups each candidate into different chick call types. Evaluating on a dataset comprising 3013 chick calls collected in laboratory conditions, the proposed recognition system using the JTFS features improves the frame- and event-based macro F-measures by 9.5% and 11.7%, respectively, than that of a mel-frequency cepstral coefficients baseline.
Audio and Speech Processing,Sound
What problem does this paper attempt to address?