Outlier mining algorithm based on the incomplete data

Hu Yang,Zhen Zhong,DaiJie Cheng
2004-01-01
Abstract:Lots of deferent ways can be used to mine outliers, among which the forward search algorithm is one of the most important ways. Since data are incomplete, data mining for outliers will encounter some difficulties, and thus one needs to make an attempt on this field. First of all, one should think of the fill of those lost data. Thinking of the mixed loss, one can simplify the application of algorithm, such as EM algorithm and MI algorithm. Furthermore, the more simple and facile RE algorithm is proposed. The actual fill of data indicates the effect of the method. When one uses the forward search algorithm to mine outliers, analyzing the formation of EM algorithm, he can use the same method to estimate the unknown parameter. Even when making usual statistical outliers testing, the test statistics that relies on residuals can also be also generated by EM algorithm. That means the result of data mining is more credible when one first completes and then mines the data. Finally, if one clusters the data before he selects initial subset, the result of research can be better and faster. What' s more, false conclusion can be avoided.
What problem does this paper attempt to address?