An Efficient Representation-Based Method for Boundary Point and Outlier Detection.

Xiaojie Li,Jiancheng Lv,Zhang Yi
DOI: https://doi.org/10.1109/tnnls.2016.2614896
IF: 14.255
2018-01-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:Detecting boundary points (including outliers) is often more interesting than detecting normal observations, since they represent valid, interesting, and potentially valuable patterns. Since data representation can uncover the intrinsic data structure, we present an efficient representation-based method for detecting such points, which are generally located around the margin of densely distributed data, such as a cluster. For each point, the negative components in its representation generally correspond to the boundary points among its affine combination of points. In the presented method, the reverse unreachability of a point is proposed to evaluate to what degree this observation is a boundary point. The reverse unreachability can be calculated by counting the number of zero and negative components in the representation. The reverse unreachability explicitly takes into account the global data structure and reveals the disconnectivity between a data point and other points. This paper reveals that the reverse unreachability of points with lower density has a higher score. Note that the score of reverse unreachability of an outlier is greater than that of a boundary point. The top-m ranked points can thus be identified as outliers. The greater the value of the reverse unreachability, the more likely the point is a boundary point. Compared with related methods, our method better reflects the characteristics of the data, and simultaneously detects outliers and boundary points regardless of their distribution and the dimensionality of the space. Experimental results obtained for a number of synthetic and real-world data sets demonstrate the effectiveness and efficiency of our method.
What problem does this paper attempt to address?