Fraud Analytics Using Machine-learning & Engineering on Big Data (FAME) for Telecom

Sudarson Roy Pratihar,Subhadip Paul,Pranab Kumar Dash,Amartya Kumar Das
2023-10-31
Abstract:Telecom industries lose globally 46.3 Billion USD due to fraud. Data mining and machine learning techniques (apart from rules oriented approach) have been used in past, but efficiency has been low as fraud pattern changes very rapidly. This paper presents an industrialized solution approach with self adaptive data mining technique and application of big data technologies to detect fraud and discover novel fraud patterns in accurate, efficient and cost effective manner. Solution has been successfully demonstrated to detect International Revenue Share Fraud with <5% false positive. More than 1 Terra Bytes of Call Detail Record from a reputed wholesale carrier and overseas telecom transit carrier has been used to conduct this study.
Machine Learning,Artificial Intelligence,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
This paper attempts to address the issue of fraud in the telecommunications industry, specifically International Revenue Share Fraud (IRSF). Telecom fraud causes approximately $46.3 billion in losses to the global telecommunications industry each year, with IRSF accounting for about $10.76 billion of that loss. Traditional rule-based methods and data mining techniques are inefficient in detecting fraud patterns because fraudsters constantly adopt new methods, leading to rapid changes in fraud patterns. Additionally, timely processing of large volumes of data (such as several hundred GB of call detail records per day) becomes very challenging. To address these issues, the paper proposes an industrialized solution—utilizing adaptive data mining techniques and big data technology to detect fraud and discover new fraud patterns, thereby achieving efficient, accurate, and cost-effective fraud detection. This solution has been successfully applied to detect IRSF, with a false positive rate of less than 5%. The effectiveness of the method has been validated using over 1.2TB of historical call detail record data, and it can be extended to other types of telecom fraud and other similar natured fraud detection.