RealKIE: Five Novel Datasets for Enterprise Key Information Extraction

Benjamin Townsend,Madison May,Christopher Wells
2024-03-29
Abstract:We introduce RealKIE, a benchmark of five challenging datasets aimed at advancing key information extraction methods, with an emphasis on enterprise applications. The datasets include a diverse range of documents including SEC S1 Filings, US Non-disclosure Agreements, UK Charity Reports, FCC Invoices, and Resource Contracts. Each presents unique challenges: poor text serialization, sparse annotations in long documents, and complex tabular layouts. These datasets provide a realistic testing ground for key information extraction tasks like investment analysis and legal data processing.
Computation and Language,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?