Weighted fast sequential DTW for multilingual audio Query-by-Example retrieval

Jozef Vavrek,Peter Viszlay,Martin Lojka,Jozef Juhár,Matúš Pleva
DOI: https://doi.org/10.1007/s10844-018-0499-2
2018-02-19
Journal of Intelligent Information Systems
Abstract:This paper examines multilingual audio Query-by-Example (QbE) retrieval, utilizing the posteriorgram-based Phonetic Unit Modelling (PUM) approach and the Weighted Fast Sequential Dynamic Time Warping (WFSDTW) algorithm. The PUM approach employs phone recognizers trained on language-specific external resources in a supervised way. Thus, the information about the phonetic distribution is embedded in the process of acoustic modelling. The resulting acoustic models were also used for language-independent QbE retrieval. The improved WFSDTW algorithm was implemented in order to perform retrievals for each query (keyword) within the particular utterance file. The major interest is placed on a retrieval performance measurement of the proposed WFSDTW solution employing posteriorgram-based keyword matching with Gaussian mixture modelling (GMM). Score normalization and fusion of four different language-dependent sub-systems was carried out using a simple max-score merging strategy. The results show a certain predominance of the proposed WFSDTW solution among two other evaluated techniques, namely basic DTW and segmental DTW algorithms. Also, the combination of multiple PUM techniques together with the WFSDTW has been proved as an effective solution for the QbE task.
computer science, information systems, artificial intelligence
What problem does this paper attempt to address?