Abstract:Data-defect would affect the data quality and the analysis results of data mining. This paper presents a data-defect inspection method with kernel-neighbor-density-change outlier factor (KNDCOF). The definition of kernel neighbor density is proposed to represent the density of each object in database, and the ascending distance series (ADS) of each object is calculated based on the kernel distance between the object and its neighbors. Then, the average density fluctuation (ADF) of the object is established according to the weighted sum of the square of density difference between the object and others in ADS. Finally, the KNDCOF of the object is equal to the ratios of the ADF of the object and the average ADF of neighbors of the object. The degree of the object being an outlier is indicated by the KNDCOF value. The experiments are performed on three real data sets to evaluate the effectiveness of the proposed method. The experimental results verify that the proposed method has higher quality of data-defect inspection and does not increase the time complexity. Note to Practitioners-Data-defect inspection is an important procedure of data preprocessing for a real industrial process. This paper presents a data-defect inspection method with kernel-neighbor-density-change outlier factor to identify the outliers, and addresses the challenges associated with the strong correlation and the nonlinearity of the industrial data. The proposed method calculates the outlier factor for each object, which quantifies how outlying it is. The outlier factor is based on the density difference between the object and its neighbors. The larger the outlier factor of an object is, the higher the outlierness of the object is. The proposed method could be wildly used in an industrial complex data set with different density regions. In the industrial field, engineers can deal with the objects with high outlier factor values based on the actual requirements.

The Influence of Data Preparation on Outlier Detection in Driveability Data

A Modified Outlier Detection Method in Dynamic Data Reconciliation

Data-driven cluster analysis method: a novel outliers detection method in multivariate data

Data-Defect Inspection with Kernel-Neighbor-Density-Change Outlier Factor.

An experimental study of existing tools for outlier detection and cleaning in trajectories

Comparison of Data Visualization, Outlier Detection and Data Dimensionality Reduction Methods

Outlier Detection and Spatial Analysis Algorithms

Improved Method for Noise Detection by DBSCAN and Angle Based Outlier Factor in High Dimensional Datasets

Exploring the Impact of Outlier Variability on Anomaly Detection Evaluation Metrics

A method for outlier detection based on cluster analysis and visual expert criteria

Comparative Analysis of three outlier detection methods in univariate data sets

A Novel Outlier Detection Method for Multivariate Data

A Parametric and Non-Parametric Approach for High-Accurate Outlier Detection.

Is it Safe to Drive? An Overview of Factors, Metrics, and Datasets for Driveability Assessment in Autonomous Driving

A system for exploring big data: an iterative k-means searchlight for outlier detection on open health data

Phase I Analysis of High-Dimensional Processes in the Presence of Outliers

Human-in-the-loop Outlier Detection.

Detecting outliers in the multivariate control charts for dispersion monitoring

D3O: A Framework for Distributed Distance-based Detection of Outliers in Large Data Sets

A Dataset for Evaluating Online Anomaly Detection Approaches for Discrete Multivariate Time Series

Multivariate Outlier Detection Approach Based on K-Nearest Neighbors and Its Application for Chemical Process Data