Abstract:In this paper, we had built the online model which are built incrementally by using online outlier detection algorithms under the streaming environment. We identified that there is highly necessity to have the streaming models to tackle the streaming data. The objective of this project is to study and analyze the importance of streaming models which is applicable in the real-world environment. In this work, we built various Outlier Detection (OD) algorithms viz., One class Support Vector Machine (OC-SVM), Isolation Forest Adaptive Sliding window approach (IForest ASD), Exact Storm, Angle based outlier detection (ABOD), Local outlier factor (LOF), KitNet, KNN ASD methods. The effectiveness and validity of the above-built models on various finance problems such as credit card fraud detection, churn prediction, ethereum fraud prediction. Further, we also analyzed the performance of the models on the health care prediction problems such as heart stroke prediction, diabetes prediction and heart stroke prediction problems. As per the results and dataset it shows that it performs well for the highly imbalanced datasets that means there is a majority of negative class and minority will be the positive class. Among all the models, the ensemble model strategy IForest ASD model performed better in most of the cases standing in the top 3 models in almost all of the cases.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in the fields of finance and healthcare, how to use stream data analysis techniques for incremental outlier detection (OD). Specifically, the author aims to study and analyze the importance of models applicable to real - time data streams, and has developed an online incremental outlier detection framework to deal with problems in the fields of finance (such as credit card fraud detection, customer churn prediction, Ethereum fraud prediction) and healthcare (such as heart attack prediction, diabetes prediction). ### Main problems: 1. **Challenges in processing stream data**: Unlike traditional batch processing, stream data is continuously generated and requires real - time processing. Therefore, traditional offline models cannot effectively handle such dynamically changing data. 2. **Processing highly imbalanced data sets**: In many practical application scenarios, data is often highly imbalanced, that is, there are far more negative - class samples than positive - class samples. For example, in fraud detection, normal transactions are far more numerous than fraudulent transactions. How to maintain the effectiveness of the model in this situation is a key issue. 3. **Selecting appropriate outlier detection algorithms**: The author compared multiple outlier detection algorithms (such as One Class SVM, Isolation Forest ASD, Exact Storm, etc.) and evaluated their performance in a stream - data environment. ### Solutions: - **Incremental modeling**: Through the sliding - window method, the model can be continuously updated when new data is received, thus adapting to the dynamic changes of the data. - **Online learning**: The model is trained while receiving new data, ensuring that abnormal behaviors in the data can be captured in a timely manner. - **Performance evaluation**: By comparing the performance of online and offline models, the superiority of the incremental model in processing stream data is verified. ### Experimental results: The experimental results show that for highly imbalanced data sets (such as stroke prediction data sets and Ethereum fraud detection data sets), the incremental model (Scenario 2) performs better than the offline model (Scenario 1). In particular, the Isolation Forest ASD model performs best on such data sets and is usually among the top three of all models. ### Summary: The main contribution of this paper is to propose an incremental outlier detection framework applicable to stream data and prove its effectiveness in handling highly imbalanced data sets through experiments. Future work can further optimize the robustness of the model and improve the detection accuracy by integrating multiple models.

Incremental Outlier Detection Modelling Using Streaming Analytics in Finance & Health Care

An efficient modelling of oversampling with optimal deep learning enabled anomaly detection in streaming data

Continuous Angle-based Outlier Detection on High-dimensional Data Streams.

ATM Fraud Detection using Streaming Data Analytics

Outlier Detection in Streaming Data for Telecommunications and Industrial Applications: A Survey

Concept drift and machine learning model for detecting fraudulent transactions in streaming environment

Robust High-dimensional Bioinformatics Data Streams Mining by ODR-ioVFDT

An Efficient Outlier Detection with Deep Learning-Based Financial Crisis Prediction Model in Big Data Environment

Empirical Analysis of Lifelog Data using Optimal Feature Selection based Unsupervised Logistic Regression (OFS-ULR) Model with Spark Streaming

Cube-based Incremental Outlier Detection for Streaming Computing

Comparative Study of Real Time Machine Learning Models for Stock Prediction through Streaming Data

Dynamic Micro-cluster-Based Streaming Data Clustering Method for Anomaly Detection.

Computationally Assisted Quality Control for Public Health Data Streams

Online Feature Selection for Streaming Features with High Redundancy Using Sliding-Window Sampling

Nowcasting the Financial Time Series with Streaming Data Analytics under Apache Spark

IPMOD: An efficient outlier detection model for high-dimensional medical data streams

A new online learning algorithm for streaming data and decision support with a Bayesian approach

ADAPTATION OF THE ALGORITHM FOR DETECTING ANOMALIES IN TIME SERIES FOR NON-STATIONARY STREAMING DATA

Towards An Online Incremental Approach to Predict Students Performance

On the Improvement of the Isolation Forest Algorithm for Outlier Detection with Streaming Data

Online Learning From Incomplete and Imbalanced Data Streams