Abstract:Speech-based inputs have been gaining significant momentum with the popularity of smartphones and tablets in our daily lives, since voice is the most popular and efficient way for human–computer interaction. This paper works toward designing more effective speech-based interfaces to query the structured data in relational databases. We first identify a new task named Speech-to-SQL , which aims to understand the information conveyed by human speech and directly translate it into structured query language (SQL) statements. A naive solution to this problem can work in a cascaded manner, that is, an automatic speech recognition component followed by a text-to-SQL component. However, it requires a high-quality ASR system and also suffers from the error compounding problem between the two components, resulting in limited performance. To handle these challenges, we propose a novel end-to-end neural architecture named SpeechSQLNet to directly translate human speech into SQL queries without an external ASR step. SpeechSQLNet has the advantage of making full use of the rich linguistic information presented in speech. To the best of our knowledge, this is the first attempt to directly synthesize SQL based on common natural language questions in spoken form, rather than a natural language-based version of SQL. To validate the effectiveness of the proposed problem and model, we further construct a dataset named SpeechQL , by piggybacking the widely used text-to-SQL datasets. Extensive experimental evaluations on this dataset show that SpeechSQLNet can directly synthesize high-quality SQL queries from human speech, outperforming various competitive counterparts as well as the cascaded methods in terms of exact match accuracies. We expect speech-to-SQL would inspire more research on more effective and efficient human–machine interfaces to lower the barrier of using relational databases.

Sevi: Speech-to-Visualization Through Neural Machine Translation

Natural Language to Visualization by Neural Machine Translation

Chat2VIS: Fine-Tuning Data Visualisations using Multilingual Natural Language Text and Pre-Trained Large Language Models

Quda: Natural Language Queries for Visual Data Analytics

SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot Interaction

Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks

Look Before you Speak: Visually Contextualized Utterances

Speech-to-SQL: Towards Speech-driven SQL Query Generation From Natural Language Question

Nvbench: A Large-Scale Synthesized Dataset for Cross-Domain Natural Language to Visualization Task

Chat2VIS: Generating Data Visualisations via Natural Language using ChatGPT, Codex and GPT-3 Large Language Models

Seer: Language Instructed Video Prediction with Latent Diffusion Models.

VisEval: A Benchmark for Data Visualization in the Era of Large Language Models

SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model

NeuSpeech: Decode Neural signal as Speech

Speech-to-SQL: toward speech-driven SQL query generation from natural language question

AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation

VISHIEN-MAAT: Scrollytelling visualization design for explaining Siamese Neural Network concept to non-technical users

Natural Language Models for Data Visualization Utilizing nvBench Dataset

SNIL: Generating Sports News From Insights With Large Language Models

NeuralVis: Visualizing and Interpreting Deep Learning Models

SummVis: Interactive Visual Analysis of Models, Data, and Evaluation for Text Summarization