On The Role of Prompt Construction In Enhancing Efficacy and Efficiency of LLM-Based Tabular Data Generation

Banooqa Banday,Kowshik Thopalli,Tanzima Z. Islam,Jayaraman J. Thiagarajan

2024-09-06

Abstract:LLM-based data generation for real-world tabular data can be challenged by the lack of sufficient semantic context in feature names used to describe columns. We hypothesize that enriching prompts with domain-specific insights can improve both the quality and efficiency of data generation. To test this hypothesis, we explore three prompt construction protocols: Expert-guided, LLM-guided, and Novel-Mapping. Through empirical studies with the recently proposed GReaT framework, we find that context-enriched prompts lead to significantly improved data generation quality and training efficiency.

Computation and Language

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the low quality and efficiency of data generation when generating real - world tabular data due to the lack of sufficient semantic context in feature names. Specifically, the paper mentions that feature names in many actual tabular datasets may be ambiguous, contain abbreviations or symbols that are not easily understood, or even be some general - purpose labels, all of which may lead to poor data generation results based on large language models (LLMs). For this reason, the author hypothesizes that adding domain - specific knowledge to the prompt can significantly improve the ability of LLMs to generate high - quality tabular data and training efficiency. To verify this hypothesis, the paper proposes three different prompt construction protocols: 1. **Expert - guided**: Domain experts provide detailed feature descriptions to enrich the prompt. 2. **LLM - guided**: Use an external LLM to automatically generate feature descriptions based on the given feature names and dataset names. 3. **Novel - Mapping**: Use an external LLM to map general - purpose feature names to meaningful features in a new domain (such as physics or life sciences) according to their value ranges. Through experiments on multiple datasets, the paper shows that these context - rich prompt strategies not only improve the quality of the generated data but also significantly improve the training efficiency, especially when using parameter - efficient fine - tuning methods such as LoRA. In addition, when the feature names are completely general - purpose and lack relevant context, the Novel - Mapping strategy also shows significant effects.

On The Role of Prompt Construction In Enhancing Efficacy and Efficiency of LLM-Based Tabular Data Generation

An Automatic Prompt Generation System for Tabular Data Tasks

Efficient Prompting for LLM-based Generative Internet of Things

A Better LLM Evaluator for Text Generation: The Impact of Prompt Output Sequencing and Optimization

Enhancing Computer Programming Education with LLMs: A Study on Effective Prompt Engineering for Python Code Generation

TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks

How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings

From Prompt Engineering to Prompt Science With Human in the Loop

A Survey on Prompting Techniques in LLMs

Grammar Prompting for Domain-Specific Language Generation with Large Language Models

Prompt Design Matters for Computational Social Science Tasks but in Unpredictable Ways

Efficient Prompting Methods for Large Language Models: A Survey

Question-Analysis Prompting Improves LLM Performance in Reasoning Tasks

LangGPT: Rethinking Structured Reusable Prompt Design Framework for LLMs from the Programming Language

Exploring Prompt Engineering Practices in the Enterprise

Revisiting Prompt Engineering via Declarative Crowdsourcing

Direct-Inverse Prompting: Analyzing LLMs' Discriminative Capacity in Self-Improving Generation

Towards a Catalog of Prompt Patterns to Enhance the Discipline of Prompt Engineering

An Empirical Categorization of Prompting Techniques for Large Language Models: A Practitioner's Guide

Meaning Typed Prompting: A Technique for Efficient, Reliable Structured Output Generation