Leveraging Domain Information to Classify Financial Documents via Unsupervised Graph Momentum Contrast

Xueni Luo,Dawei Cheng,Haorui Ma,Junhao Wang,Mengzhen Fan,Yifeng Luo
DOI: https://doi.org/10.1145/3459637.3482133
2021-10-26
Abstract:Financial documents often contain rich domain information, such as named entities, which could be used to indicate the documents' classification categories. Existing classification methods either ignore such contained financial domain information, achieving less optimal performances, or train document representations in supervised ways, with expensive data labeling costs. In this paper, we propose to leverage domain information to improve classification performance for financial documents, via a graph representation learning model, namely G-MoCo, based on unsupervised graph momentum contrast. With G-MoCo, we could extract latent features from massive unlabeled raw data, and then further use the learned representations for document classification. Compared with the state-of-the-art baselines, representations learned by our method could improve performances by significant margins on a financial document dataset and three non-financial public graph datasets.
What problem does this paper attempt to address?