Algorithm of spatial outlier mining based on MST clustering
LIN Jiaxiang,CHEN Chongcheng,FAN Minghui,ZHENG Minqi
2008-01-01
Geo-information Science
Abstract:A spatial outlier is a spatial object whose non-spatial attribute values are significantly deviated from the other data's in the dataset.How to detect spatial outliers from spatial dataset and to explain the reason causes the anomaly in practical application have become more and more interesting to many researchers.Spatial outliers mining can bring us a lot of interesting information,but for the complicated characteristic of spatial data,such as topological relation,orientation relation,measurement relation,and so on,traditional algorithms for outlier mining in business database seem to deficient in spatial dataset,the main problem lies in the difficulty to maintain spatial structure characteristics for most existing algorithms during the process of outlier mining.Thanks to the similarities between clustering and outlier mining,clustering based outlier mining is an important way to detect anomalies from dataset.However,due to the diversity of clustering algorithms,it is difficult to choose a proper one for outlier mining,and the main purpose of clustering is to find out the principal features of the dataset,outliers are the by-products of clustering.Based on minimum spanning tree clustering,a new algorithm for spatial outlier mining called SOM is proposed.The algorithm keeps basic spatial structure characteristics of spatial objects through the use of geometric structure: Delaunay triangulated irregular network and minimum spanning tree(MST),and it gains MST clustering by cutting off several most inconsistent edges of MST,so that it not only owns the function that it can acquire clusters from non-spherical and unbalanced datasets as the density-based cluster algorithms does,but also has the advantage that it doesn't depend on user's pre-set parameters,so the clustering result is usually more reasonable.Finally,the validity of SOM algorithm is validated by real application of geochemical soil elements dataset inspected to coastal areas of Fujian province,through analysis it is found that the algorithm is also applicable for spatial outlier mining in massive spatial dataset.