A double-weighted outlier detection algorithm considering the neighborhood orientation distribution of data objects

Qiang Gao,Qin-Qin Gao,Zhong-Yang Xiong,Yu-Fang Zhang,Yu-Qin Wang,Min Zhang
DOI: https://doi.org/10.1007/s10489-023-04593-6
IF: 5.3
2023-06-19
Applied Intelligence
Abstract:Outlier detection is a hot research topic in data mining, and its requirements for algorithms to engage with various complex-shaped datasets more effectively are also increasing. This paper conducts in-depth research on the existing problems, which focuses on the low-density pattern and the local outliers detection of the outlier detection algorithms. In order to resolve these problems, We present a double-weighted outlier detection (DDW) algorithm considering the dense direction, which simultaneously considers the distance and orientation relationship of the neighborhood distribution. In DDW, we first propose a concept of dense direction, which moves the research object of the algorithm from a point to a region to explore the relationship between the data points and the distribution of their neighbors more comprehensively. Then, we design a new point weighting strategy by exploring the point distribution of the neighborhood indicated by the dense directions of different data points and design a new edge weighting strategy where we give the edge weights to the edges between data points and their neighbors to better represent the closeness of data points. After that, we design a new double-weighted method that further actualizes the complementary advantages of the point weighting strategy and the edge weighting strategy to solve the problem that the existing outlier detection algorithms cannot fully characterize the potential structural information inside the data. The final comprehensive experiment shows that our proposed method not only eliminates the defect that traditional outlier detection algorithms are sensitive to neighboring parameters but our proposal also has higher detection accuracy of local outlier detection than many current methods on both synthetic and UCI datasets.
computer science, artificial intelligence
What problem does this paper attempt to address?