On Feature Importance and Interpretability of Speaker Representations

Frederik Rautenberg,Michael Kuhlmann,Jana Wiechmann,Fritz Seebauer,Petra Wagner,Reinhold Haeb-Umbach
2023-10-19
Abstract:Unsupervised speech disentanglement aims at separating fast varying from slowly varying components of a speech signal. In this contribution, we take a closer look at the embedding vector representing the slowly varying signal components, commonly named the speaker embedding vector. We ask, which properties of a speaker's voice are captured and investigate to which extent do individual embedding vector components sign responsible for them, using the concept of Shapley values. Our findings show that certain speaker-specific acoustic-phonetic properties can be fairly well predicted from the speaker embedding, while the investigated more abstract voice quality features cannot.
Audio and Speech Processing
What problem does this paper attempt to address?