Financial Numeric Extreme Labelling: A Dataset and Benchmarking for XBRL Tagging

Soumya Sharma,Subhendu Khatuya,Manjunath Hegde,Afreen Shaikh. Koustuv Dasgupta,Pawan Goyal,Niloy Ganguly
2023-06-06
Abstract:The U.S. Securities and Exchange Commission (SEC) mandates all public companies to file periodic financial statements that should contain numerals annotated with a particular label from a taxonomy. In this paper, we formulate the task of automating the assignment of a label to a particular numeral span in a sentence from an extremely large label set. Towards this task, we release a dataset, Financial Numeric Extreme Labelling (FNXL), annotated with 2,794 labels. We benchmark the performance of the FNXL dataset by formulating the task as (a) a sequence labelling problem and (b) a pipeline with span extraction followed by Extreme Classification. Although the two approaches perform comparably, the pipeline solution provides a slight edge for the least frequent labels.
Computation and Language,Artificial Intelligence,Computational Engineering, Finance, and Science
What problem does this paper attempt to address?