Abstract:In today world we are confronted with increasing amounts of information every day coming from a large variety of sources. People and co-operations are producing data on a large scale, and since the rise of the internet, e-mail and social media the amount of produced data has grown exponentially. From a law enforcement perspective we have to deal with these huge amounts of data when a criminal investigation is launched against an individual or company. Relevant questions need to be answered like who committed the crime, who were involved, what happened and on what time, who were communicating and about what? Not only the amount of available data to investigate has increased enormously, but also the complexity of this data has increased. When these communication patterns need to be combined with for instance a seized financial administration or corporate document shares a complex investigation problem arises. Recently, criminal investigators face a huge challenge when evidence of a crime needs to be found in the Big Data environment where they have to deal with large and complex datasets especially in financial and fraud investigations. To tackle this problem, a financial and fraud investigation unit of a European country has developed a new tool named LES that uses Natural Language Processing (NLP) techniques to help criminal investigators handle large amounts of textual information in a more efficient and faster way. In this paper, we present briefly this tool and we focus on the evaluation its performance in terms of the requirements of forensic investigation: speed, smarter and easier for investigators. In order to evaluate this LES tool, we use different performance metrics. We also show experimental results of our evaluation with large and complex datasets from real-world application.

The Enron Corpus: Where the Email Bodies are Buried?

Analysis of Communication Pattern with Scammers in Enron Corpus

Network and Sentiment Analysis of Enron Emails

Text Categorization of Enron Email Corpus Based on Information Bottleneck and Maximal Entropy

Analyzing the risk and financial impact of phishing attacks using a knowledge based approach

Performance Evaluation of a Natural Language Processing approach applied in White Collar crime investigation

Leveraging Financial Social Media Data for Corporate Fraud Detection

Distinguishing Scams and Fraud with Ensemble Learning

How to Detect and Forecast Corporate Fraud by Media Reports? an Approach Using Machine Learning and Qualitative Comparative Analysis

Code Word Detection in Fraud Investigations using a Deep-Learning Approach

Finding top performers through email patterns analysis

Evaluating the Efficacy of Large Language Models in Identifying Phishing Attempts

Email Spam Detection using Deep Learning Approach

Tone at the Bottom: Measuring Corporate Misconduct Risk from the Text of Employee Reviews

E-NER -- An Annotated Named Entity Recognition Corpus of Legal Text

The Anatomy of Deception: Technical and Human Perspectives on a Large-scale Phishing Campaign

Evidential Strategies in Financial Statement Analysis: A Corpus Linguistic Text Mining Approach to Bankruptcy Prediction

Finding Needles in a Haystack: Using Data Analytics to Improve Fraud Prediction

The Anatomy of Deception: Measuring Technical and Human Factors of a Large-scale Phishing Campaign

Scamming Higher Ed: An Analysis of Phishing Content and Trends

Entity Extraction from High-Level Corruption Schemes via Large Language Models