TETA: Text-Enhanced Tabular Data Annotation with Multi-task Graph Convolutional Network

Hongzhi Wang,Haoshi Zhi,Shihao Jiang,Hua Zhang,Yifan Wu,Dai Griffiths
DOI: https://doi.org/10.1007/978-3-031-30675-4_38
2023-01-01
Abstract:Tabular data annotation, which aims to match cells (or col-umns) to their semantic entities (or types), is crucial to tackling the absence of table content. Recent approaches tend to learn embeddings for tabular data based on deep learning models, but are not conducive to parsing tabular data without metadata. While the metadata may not always be available, entity-related textual information can be easily obtained through external sources such as knowledge bases. Motivated by this, we introduce entity-related textual details in this study to enhance the understanding of tabular data. To obtain better embeddings, we propose a novel model TETA, which adopts the graph convolutional network to refine semantic and structure information from constructed graph features based on tables, entities, types, and text. Meanwhile, we adopt a multi-task learning technique to improve its performance and robustness. We compare TETA with five baselines on five datasets. The results of tabular data annotation and novelty classification demonstrate the effectiveness and promise of TETA.
What problem does this paper attempt to address?