Abstract:Spoken dialogue systems (SDS) heavily rely on dialogue state tracking (DST) for success. However, providing sufficient computational power for training proves challenging, given that DST involves tracking states from both user and system utterances. While machine learning approaches have improved DST, they have notable limitations. These approaches often overlook unseen slot values during training and use two separate modules to extract, generate, or match slot values, leading to high time and resource consumption. Moreover, learning and deducing relevant values for related slots remain understudied challenges. To address these gaps, this paper introduces UTMGAT-a Unified Transformer with Memory Encoder and Graph Attention Networks (GAT) for Multidomain DST. UTMGAT employs a BERT tokenizer to construct user utterances and a candidate sets vocabulary, reducing the need for constant retraining when dealing with unseen values. It utilizes a single transformer to gather dialogue context for slots and generate slot values, enhancing prediction accuracy while reducing memory and computation time. UTMGAT incorporates an embedding layer aggregator to filter out unnecessary values, identify required nodes for GAT, and establish relationships among relevant values associated with related slots. This approach simplifies graph representation and diminishes required computation power. The input to the GAT maintains equal size with batch sizes, generated through padding. Finally, we have experimentally evaluated our model against several models including LLM approaches over four popular datasets with our approach outperforming all competing models except two approaches on one dataset.

Injecting linguistic knowledge into BERT for Dialogue State Tracking

Exploiting domain-slot related keywords description for Few-Shot Cross-Domain Dialogue State Tracking

Jointly Optimizing State Operation Prediction and Value Generation for Dialogue State Tracking

UTMGAT: a unified transformer with memory encoder and graph attention networks for multidomain dialogue state tracking

Enhancing Dialogue State Tracking Models through LLM-backed User-Agents Simulation

Enhanced Multi-Domain Dialogue State Tracker with Second-Order Slot Interactions

N-Shot Learning for Augmenting Task-Oriented Dialogue State Tracking

Domain-Lifelong Learning for Dialogue State Tracking Via Knowledge Preservation Networks

Multi-Domain Dialogue State Tracking based on State Graph

DSTEA: Improving Dialogue State Tracking via Entity Adaptive Pre-training

Efficient Context and Schema Fusion Networks for Multi-Domain Dialogue State Tracking.

Intent-driven In-context Learning for Few-shot Dialogue State Tracking

Domain-Aware Dialogue State Tracker for Multi-Domain Dialogue Systems

Divide, Conquer, and Combine: Mixture of Semantic-Independent Experts for Zero-Shot Dialogue State Tracking

SF-DST: Few-Shot Self-Feeding Reading Comprehension Dialogue State Tracking with Auxiliary Task

Non-Autoregressive Dialog State Tracking

XQA-DST: Multi-Domain and Multi-Lingual Dialogue State Tracking

A Fast and Robust BERT-based Dialogue State Tracker for Schema-Guided Dialogue Dataset

KILDST: Effective Knowledge-Integrated Learning for Dialogue State Tracking using Gazetteer and Speaker Information

Zero-shot language extension for dialogue state tracking via pre-trained models and multi-auxiliary-tasks fine-tuning