Anonymising Elderly and Pathological Speech: Voice Conversion Using DDSP and Query-by-Example

Suhita Ghosh,Melanie Jouaiti,Arnab Das,Yamini Sinha,Tim Polzehl,Ingo Siegert,Sebastian Stober
DOI: https://doi.org/10.21437/Interspeech.2024-328
2024-10-21
Abstract:Speech anonymisation aims to protect speaker identity by changing personal identifiers in speech while retaining linguistic content. Current methods fail to retain prosody and unique speech patterns found in elderly and pathological speech domains, which is essential for remote health monitoring. To address this gap, we propose a voice conversion-based method (DDSP-QbE) using differentiable digital signal processing and query-by-example. The proposed method, trained with novel losses, aids in disentangling linguistic, prosodic, and domain representations, enabling the model to adapt to uncommon speech patterns. Objective and subjective evaluations show that DDSP-QbE significantly outperforms the voice conversion state-of-the-art concerning intelligibility, prosody, and domain preservation across diverse datasets, pathologies, and speakers while maintaining quality and speaker anonymity. Experts validate domain preservation by analysing twelve clinically pertinent domain attributes.
Artificial Intelligence,Sound,Audio and Speech Processing,Quantitative Methods
What problem does this paper attempt to address?
The main problem this paper attempts to address is the inability of existing voice conversion methods to effectively preserve intonation and unique voice patterns when dealing with elderly and pathological speech, thereby affecting data privacy protection and the retention of clinically relevant features in remote health monitoring. Specifically: 1. **Need for Voice Anonymization**: With the widespread use of cloud-based voice technology among the elderly and people with speech disorders, these voice recordings contain highly sensitive personal data, necessitating anonymization before data sharing. 2. **Limitations of Existing Methods**: Although existing voice conversion methods can achieve voice anonymization to some extent, they often fail to preserve the intonation and unique voice patterns in elderly and pathological speech. These patterns are crucial for remote health monitoring, such as the hoarseness in dementia patients. 3. **Proposed New Method**: To address the above issues, the authors propose a voice conversion method based on Differentiable Digital Signal Processing (DDSP) and Query-by-Example (QbE) (DDSP-QbE). This method introduces new loss functions to help the model separate linguistic, intonation, and domain representations, thereby adapting to uncommon voice patterns. 4. **Evaluation and Validation**: Through objective and subjective evaluations, the research results show that DDSP-QbE significantly outperforms existing voice conversion methods in preserving intonation, domain features, and improving anonymization quality. Experts also validated the effectiveness of this method by analyzing 12 clinically important domain attributes. In summary, this paper aims to address the shortcomings of existing methods in handling elderly and pathological speech by proposing a new voice conversion method, thereby better protecting data privacy and retaining clinically relevant voice features.