Topological Data Analysis for Speech Processing
Eduard Tulchinskii,Kristian Kuznetsov,Laida Kushnareva,Daniil Cherniavskii,Serguei Barannikov,Irina Piontkovskaya,Sergey Nikolenko,Evgeny Burnaev
DOI: https://doi.org/10.21437/Interspeech.2023-1861
2023-06-06
Abstract:We apply topological data analysis (TDA) to speech classification problems and to the introspection of a pretrained speech model, HuBERT. To this end, we introduce a number of topological and algebraic features derived from Transformer attention maps and embeddings. We show that a simple linear classifier built on top of such features outperforms a fine-tuned classification head. In particular, we achieve an improvement of about $9\%$ accuracy and $5\%$ ERR on four common datasets; on CREMA-D, the proposed feature set reaches a new state of the art performance with accuracy $80.155$. We also show that topological features are able to reveal functional roles of speech Transformer heads; e.g., we find the heads capable to distinguish between pairs of sample sources (natural/synthetic) or voices without any downstream fine-tuning. Our results demonstrate that TDA is a promising new approach for speech analysis, especially for tasks that require structural prediction. Appendices, an introduction to TDA, and other additional materials are available here - <a class="link-external link-https" href="https://topohubert.github.io/speech-topology-webpages/" rel="external noopener nofollow">this https URL</a>
Sound,Computation and Language,Machine Learning,Audio and Speech Processing,Algebraic Topology