Abstract:Detecting boundary points (including outliers) is often more interesting than detecting normal observations, since they represent valid, interesting, and potentially valuable patterns. Since data representation can uncover the intrinsic data structure, we present an efficient representation-based method for detecting such points, which are generally located around the margin of densely distributed data, such as a cluster. For each point, the negative components in its representation generally correspond to the boundary points among its affine combination of points. In the presented method, the reverse unreachability of a point is proposed to evaluate to what degree this observation is a boundary point. The reverse unreachability can be calculated by counting the number of zero and negative components in the representation. The reverse unreachability explicitly takes into account the global data structure and reveals the disconnectivity between a data point and other points. This paper reveals that the reverse unreachability of points with lower density has a higher score. Note that the score of reverse unreachability of an outlier is greater than that of a boundary point. The top-m ranked points can thus be identified as outliers. The greater the value of the reverse unreachability, the more likely the point is a boundary point. Compared with related methods, our method better reflects the characteristics of the data, and simultaneously detects outliers and boundary points regardless of their distribution and the dimensionality of the space. Experimental results obtained for a number of synthetic and real-world data sets demonstrate the effectiveness and efficiency of our method.

An Efficient Representation-Based Method for Boundary Point and Outlier Detection.

A Local-Gravitation-based Method for the Detection of Outliers and Boundary Points

Outlier Detection Algorithm Based on Reachable Neighbor

An Efficient Method for Boundary Points Detection Based on Data Expression

Automatic Detection of Boundary Points Based on Local Geometrical Measures

An Angle And Density-Based Method For Key Points Detection

A Robust and Efficient Boundary Point Detection Method by Measuring Local Direction Dispersion

Towards a Compact and Effective Representation for Datasets with Inhomogeneous Clusters.

Efficient Outlier Detection for High-Dimensional Data

A New Outlier Detection Method Based on OPTICS

Brim: An Efficient Boundary Points Detecting Algorithm

Provable Self-Representation Based Outlier Detection in a Union of Subspaces

An Efficient Algorithm for Distributed Outlier Detection in Large Multi-Dimensional Datasets

A New Outlier Detection Method Based on Machine Learning

Outlier Detection Using Local Density and Global Structure

Boundary Peeling: Outlier Detection Method Using One-Class Peeling

A Graph-Based Approach for Detecting Spatial Cross-Outliers from Two Types of Spatial Point Events

A neighborhood weighted-based method for the detection of outliers

A Hybrid Distance-Based Outlier Detection Approach

Robust Geodesic Based Outlier Detection for Class Imbalance Problem.

Markov Boundary-Based Outlier Mining