Rediscovering the Slavic Continuum in Representations Emerging from Neural Models of Spoken Language Identification

Badr M. Abdullah,Jacek Kudera,Tania Avgustinova,Bernd Möbius,Dietrich Klakow
DOI: https://doi.org/10.48550/arXiv.2010.11973
2020-10-22
Computation and Language
Abstract:Deep neural networks have been employed for various spoken language recognition tasks, including tasks that are multilingual by definition such as spoken language identification. In this paper, we present a neural model for Slavic language identification in speech signals and analyze its emergent representations to investigate whether they reflect objective measures of language relatedness and/or non-linguists' perception of language similarity. While our analysis shows that the language representation space indeed captures language relatedness to a great extent, we find perceptual confusability between languages in our study to be the best predictor of the language representation similarity.
What problem does this paper attempt to address?