Abstract:Financial fraud has extremely damaged the sustainable growth of financial markets as a serious problem worldwide. Nevertheless, it is fairly challenging to identify frauds with highly imbalanced dataset because ratio of non-fraud companies is very high compared to fraudulent ones. Intelligent financial statement fraud detection systems have therefore been developed to support decision-making for the stakeholders. However, most of current approaches only considered the quantitative part of the financial statement ratios while there has been less usage of the textual information for classifying, especially those related comments in Chinese. As such, this paper aims to develop an enhanced system for detecting financial fraud using a state-of-the-art deep learning models based on combination of numerical features that derived from financial statement and textual data in managerial comments of 5130 Chinese listed companies’ annual reports. First, we construct financial index system including both financial and non-financial indices that previous researches usually excluded. Then the textual features in MD&A section of Chinese listed company’s annual reports are extracted using word vector. After that, powerful deep learning models are employed and their performances are compared with numeric data, textual data and combination of them, respectively. The empirical results show great performance improvement of the proposed deep learning methods against traditional machine learning methods, and LSTM, GRU approaches work with testing samples in correct classification rates of 94.98% and 94.62%, indicating that the extracted textual features of MD&A section exhibit promising classification results and substantially reinforce financial fraud detection.

From Spin to Swindle: Identifying Falsification in Financial Text

Identification of Financial Report Fraud Based on Text Analysis

Analyzing Financial Fraud Cases Using a Linguistics-Based Text Mining Approach

Fraud Detection in Financial Statements using Text Mining Methods: A Review

Leveraging Financial Social Media Data for Corporate Fraud Detection

Corporate fraud detection based on linguistic readability vector: Application to financial companies in China

Financial Reporting Fraud Detection: An Analysis of Data Mining Algorithms

Fraud detection in telephone conversations for financial services using linguistic features

Which Spoken Language Markers Identify Deception in High-Stakes Settings? Evidence From Earnings Conference Calls

Textual Data Mining for Financial Fraud Detection: A Deep Learning Approach

Intelligent Fraud Detection in Financial Statements using Machine Learning and Data Mining: A Systematic Literature Review

Detecting Deception through Linguistic Analysis

Evidential Strategies in Financial Statement Analysis: A Corpus Linguistic Text Mining Approach to Bankruptcy Prediction

Detection of Fraud Statement Based on Word Vector: Evidence from Financial Companies in China

Finding Needles in a Haystack: Using Data Analytics to Improve Fraud Prediction

What are You Saying? Using Topic to Detect Financial Misreporting

Deceptive Financial Reporting Detection: A Hierarchical Clustering Approach Based on Linguistic Features

Unearthing Financial Statement Fraud: Insights from News Coverage Analysis

An Analysis on Financial Statement Fraud Detection for Chinese Listed Companies Using Deep Learning

Natural Language Processing and Text Mining Algorithms for Financial Accounting Information Disclosure

Can earnings conference calls tell more lies? A contrastive multimodal dialogue network for advanced financial statement fraud detection