Abstract:Spoken language understanding (SLU), one of the key enabling technologies for human-computer interaction in IoT devices, provides an easy-to-use user interface. Human speech can contain a lot of user-sensitive information, such as gender, identity, and sensitive content. New types of security and privacy breaches have thus emerged. Users do not want to expose their personal sensitive information to malicious attacks by untrusted third parties. Thus, the SLU system needs to ensure that a potential malicious attacker cannot deduce the sensitive attributes of the users, while it should avoid greatly compromising the SLU accuracy. To address the above challenge, this paper proposes a novel SLU multi-task privacy-preserving model to prevent both the speech recognition (ASR) and identity recognition (IR) attacks. The model uses the hidden layer separation technique so that SLU information is distributed only in a specific portion of the hidden layer, and the other two types of information are removed to obtain a privacy-secure hidden layer. In order to achieve good balance between efficiency and privacy, we introduce a new mechanism of model pre-training, namely joint adversarial training, to further enhance the user privacy. Experiments over two SLU datasets show that the proposed method can reduce the accuracy of both the ASR and IR attacks close to that of a random guess, while leaving the SLU performance largely unaffected.

Large-Scale Unsupervised Pre-Training for End-to-End Spoken Language Understanding.

Understanding Semantics from Speech Through Pre-training

End-to-End Cross-Lingual Spoken Language Understanding Model with Multilingual Pretraining.

Semi-Supervised Spoken Language Understanding Via Self-Supervised Speech and Language Model Pretraining.

Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding

End-to-end spoken language understanding using joint CTC loss and self-supervised, pretrained acoustic encoders

Ensemble Chinese End-to-End Spoken Language Understanding for Abnormal Event Detection from audio stream

The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation

Generating More Audios for End-to-End Spoken Language Understanding

Three-Module Modeling For End-to-End Spoken Language Understanding Using Pre-trained DNN-HMM-Based Acoustic-Phonetic Model

A Study into Pre-training Strategies for Spoken Language Understanding on Dysarthric Speech

Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding

Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents

DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding

On joint training with interfaces for spoken language understanding

Bottleneck Low-rank Transformers for Low-resource Spoken Language Understanding

An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition.

A Survey on Speech Large Language Models

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition

Privacy-Preserving End-to-End Spoken Language Understanding