A Comparative Analysis of Temporal Long Text Similarity: Application to Financial Documents

Vipula Rawte,Aparna Gupta,Mohammed J. Zaki
DOI: https://doi.org/10.1007/978-3-030-66981-2_7
2021-01-01
Abstract:Temporal text documents exist in many real-world domains. These may span over long periods of time during which there tend to be many variations in the text. In particular, variations or the similarities in a pair of documents over two consecutive years could be meaningful. Most of the textual analysis work like text classification focuses on the entire text snippet as a data instance. It is therefore important to study such similarities besides the entire text document. In Natural Language Processing (NLP), the task of textual similarity is important for search and query retrieval. This task is also better known as Semantic Textual Similarity (STS) that aims to capture the semantics of two texts while comparing them. Also, state-of-the-art methods predominantly target short texts. Thus, measuring the semantic similarity between a pair of long texts is still a challenge. In this paper, we compare different text matching methods for the documents over two consecutive years. We focus on their similarities for our comparative analysis and evaluation of financial documents, namely public 10-K filings to the SEC (Securities and Exchange Commission). We further perform textual regression analysis on six quantitative bank variables including Return on Assets (ROA), Earnings per Share (EPS), Tobin’s Q Ratio, Tier 1 Capital Ratio, Leverage Ratio, and Z-score, and show that textual features can be effective in predicting these variables.
What problem does this paper attempt to address?