Abstract:Financial fraud has extremely damaged the sustainable growth of financial markets as a serious problem worldwide. Nevertheless, it is fairly challenging to identify frauds with highly imbalanced dataset because ratio of non-fraud companies is very high compared to fraudulent ones. Intelligent financial statement fraud detection systems have therefore been developed to support decision-making for the stakeholders. However, most of current approaches only considered the quantitative part of the financial statement ratios while there has been less usage of the textual information for classifying, especially those related comments in Chinese. As such, this paper aims to develop an enhanced system for detecting financial fraud using a state-of-the-art deep learning models based on combination of numerical features that derived from financial statement and textual data in managerial comments of 5130 Chinese listed companies’ annual reports. First, we construct financial index system including both financial and non-financial indices that previous researches usually excluded. Then the textual features in MD&A section of Chinese listed company’s annual reports are extracted using word vector. After that, powerful deep learning models are employed and their performances are compared with numeric data, textual data and combination of them, respectively. The empirical results show great performance improvement of the proposed deep learning methods against traditional machine learning methods, and LSTM, GRU approaches work with testing samples in correct classification rates of 94.98% and 94.62%, indicating that the extracted textual features of MD&A section exhibit promising classification results and substantially reinforce financial fraud detection.

Identification of Financial Report Fraud Based on Text Analysis

Corporate fraud detection based on linguistic readability vector: Application to financial companies in China

Research on financial fraud identification of listed companies based on text data mining

Research on Identification and Prediction of Financial Fraud of Listed Companies Based on Machine Learning

An Analysis on Financial Statement Fraud Detection for Chinese Listed Companies Using Deep Learning

Financial Fraud Detection of Listed Companies in China: A Machine Learning Approach

Textual Data Mining for Financial Fraud Detection: A Deep Learning Approach

Empirical Analysis of Financial Fraud Identification in Chinese Listed Companies

Financial Fraud Identification Model of Listed Companies based on Time-Series Information

Financial Reporting Fraud Detection: An Analysis of Data Mining Algorithms

Application of Machine Learning Methods to Risk Assessment of Financial Statement Fraud: Evidence from China

A genetic algorithm approach to detecting temporal patterns indicative of financial statement fraud

Research on financial risk screening of listed companies based on clustering algorithm

Empirical Analysis of Financial Statement Fraud of Listed Companies Based on Logistic Regression and Random Forest Algorithm

Financial Fraud Detection and Prediction in Listed Companies Using SMOTE and Machine Learning Algorithms

A Financial Fraud Detection Model Based on Organizational Impression Management Strategy

Data mining of corporate financial fraud based on neural network model

Leveraging Financial Social Media Data for Corporate Fraud Detection

Can the text features of regulatory inquiry letters predict companies' financial restatements? Evidence from China

An intelligent detecting model for financial frauds in Chinese A‐share market

A deep learning approach of financial distress recognition combining text