Abstract:The research on spoken language understanding (SLU) system has progressed extremely fast during the past decades. Intent detection and slot filling are two main tasks for building a SLU system. Multiple deep learning based models have demonstrated good results on these tasks. The most effective algorithms are based on the structures of sequence to sequence models (or “encoder-decoder” models), and generate the intents and semantic tags either using separate models (Yao K, et al., Spoken language understanding using long short-term memory neural networks, South Lake Tahoe. 189–194, 2014; Mesnil, et al. IEEE/ACM Trans Audio Speech Lang Process, 23:530–539 2014; Peng, et al. Recurrent neural networks with external memory for spoken language understanding. Proceedings of the 2015 Natural Language Processing and Chinese Computing. 9362:25–35, 2015; Kurata, et al. Leveraging sentence-level information with encoder LSTM for semantic slot filling. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2077–2083, 2016; Hahn, et al. Inst Electr Electron Eng Trans Audio Speech Lang Process. 19:1569–1583, 2011) or a joint model (Liu and Lane. Attention-based recurrent neural network models for joint intent detection and slot filling. Proceedings of the Interspeech. 685–689, 2016; Hakkani-Tür et al., Multi-domain joint semantic frame parsing using bi-directional RNN-LSTM. Proceedings of the Interspeech. 715-719, 2016; Guo, et al. Joint semantic utterance classification and slot filling with recursive neural networks. Proceedings of the 2014 Institute of Electrical and Electronics Engineers Spoken Language Technology Workshop. 554–559, 2014). Most of the previous studies, however, either treat the intent detection and slot filling as two separate parallel tasks, or use a sequence to sequence model to generate both semantic tags and intent. Most of these approaches use one (joint) NN based model (including encoder-decoder structure) to model two tasks, hence may not fully take advantage of the cross impact between them. In this chapter, new Bi-model based RNN semantic frame parsing network structures are designed to perform the intent detection and slot filling tasks jointly, by considering their cross-impact to each other using two correlated bidirectional LSTMs (BLSTM). The Bi-model structure with a decoder achieves state-of-the-art results on the benchmark ATIS data (Charles, et al. The atis spoken language systems pilot corpus. Proceedings of a Workshop Held at Hidden Valley. 96–101, 1990; Tur G et al. What is left to be understood in atis? IEEE Spoken Language Technology Workshop. 19–24, 2010), with about 0.5% intent accuracy improvement and 0.9% slot filling improvement.

Effective Spoken Language Labeling with Deep Recurrent Neural Networks

Label-Dependencies Aware Recurrent Neural Networks

Using Deep Time Delay Neural Network for Slot Filling in Spoken Language Understanding.

Recurrent Neural Networks with Pre-trained Language Model Embedding for Slot Filling Task

Efficient Spoken Language Recognition via Multilabel Classification

Modified Recurrent Neural Networks in Spoken Language Understanding

Speech recognition with deep recurrent neural networks

Performance Evaluation of Deep Neural Networks Applied to Speech Recognition: RNN, LSTM and GRU

A Dual RNN Semantic Analysis Framework for Intent Classification and Slot

Investigation of Senone-based Long-Short Term Memory RNNs for Spoken Language Recognition

A Deep Neural Framework for Continuous Sign Language Recognition by Iterative Training

Bidirectional RNN for Audio Deep Learning in an End-to-End Model

Spoken Language Understanding Method Based on Recurrent Neural Network with Persistent Memory

A Joint Model of Intent Determination and Slot Filling for Spoken Language Understanding.

Deep belief network based CRF for spoken language understanding

Graph LSTM with Context-Gated Mechanism for Spoken Language Understanding.

Deep Semantic Role Labeling With Self-Attention

Deep causal speech enhancement and recognition using efficient long-short term memory Recurrent Neural Network

Multilingual Recurrent Neural Networks with Residual Learning for Low-Resource Speech Recognition.

A Bi-model based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling

Towards Efficient Recurrent Architectures: A Deep LSTM Neural Network Applied to Speech Enhancement and Recognition