Research on financial fraud identification of listed companies based on text data mining

Hualing Liu,Long Tao,Mengyao Liu,hualing liu,long tao,mengyao liu
DOI: https://doi.org/10.1117/12.2581398
2020-11-10
Abstract:Identifying the financial fraud behavior of listed companies in a timely manner and helping investors avoid risks is a key measure to promote the healthy development of the capital market. Aiming at the lag of traditional fraud identification method, a text classification model suitable for Chinese capital market is constructed based on user generated content (UGC) on social media platform. To detect fraudulent companies,TF-IDF features, topic features and explosive news quantity features are extracted guided by systemic functional linguistics (SFL) theory. Empirical analysis is conducted by crawling the news and comments of 124 companies from Eastmoney and JRJ.com.The empirical analysis results show that the model constructed by the article can effectively extract the implicit information in unstructured data and improve the timeliness of screening fraudulent behaviors of listed companies. The topic and keyword features in text comments play an important role in distinguishing the fraud of listed companies. On the one hand, it helps individual and institutional investors avoid investment traps, on the other hand, it helps regulators detect companies with fraudulent potential in time to prevent market risks.
What problem does this paper attempt to address?