Abstract:Text-to-Table aims to generate structured tables to convey the key information from unstructured documents. Existing text-to-table datasets are typically oriented English, limiting the research in non-English languages. Meanwhile, the emergence of large language models (LLMs) has shown great success as general task solvers in multi-lingual settings (e.g., ChatGPT), theoretically enabling text-to-table in other languages. In this paper, we propose a Chinese text-to-table dataset, CT-Eval, to benchmark LLMs on this task. Our preliminary analysis of English text-to-table datasets highlights two key factors for dataset construction: data diversity and data hallucination. Inspired by this, the CT-Eval dataset selects a popular Chinese multidisciplinary online encyclopedia as the source and covers 28 domains to ensure data diversity. To minimize data hallucination, we first train an LLM to judge and filter out the task samples with hallucination, then employ human annotators to clean the hallucinations in the validation and testing sets. After this process, CT-Eval contains 88.6K task samples. Using CT-Eval, we evaluate the performance of open-source and closed-source LLMs. Our results reveal that zero-shot LLMs (including GPT-4) still have a significant performance gap compared with human judgment. Furthermore, after fine-tuning, open-source LLMs can significantly improve their text-to-table ability, outperforming GPT-4 by a large margin. In short, CT-Eval not only helps researchers evaluate and quickly understand the Chinese text-to-table ability of existing LLMs but also serves as a valuable resource to significantly improve the text-to-table performance of LLMs.

CJRC: A Reliable Human-Annotated Benchmark DataSet for Chinese Judicial Reading Comprehension

Augmented and challenging datasets with multi-step reasoning and multi-span questions for Chinese judicial reading comprehension

JEC-QA: A Legal-Domain Question Answering Dataset

CAIL2018: A Large-Scale Legal Dataset for Judgment Prediction.

A Chinese Machine Reading Comprehension Dataset Automatic Generated Based on Knowledge Graph

Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study

Dataset for the First Evaluation on Chinese Machine Reading Comprehension

A Span-Extraction Dataset for Chinese Machine Reading Comprehension

DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications

Various Legal Factors Extraction Based on Machine Reading Comprehension.

Automatic Judgment Prediction via Legal Reading Comprehension

LEEC: A Legal Element Extraction Dataset with an Extensive Domain-Specific Label System

LeDQA: A Chinese Legal Case Document-based Question Answering Dataset

ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion

Native Chinese Reader: A Dataset Towards Native-Level Chinese Machine Reading Comprehension

A Survey on Legal Judgment Prediction: Datasets, Metrics, Models and Challenges

A Sentence Cloze Dataset for Chinese Machine Reading Comprehension

DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems

LeCaRD: A Legal Case Retrieval Dataset for Chinese Law System

LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language Models

CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models