Two approaches for novelty detection using random forest.

Qifeng Zhou,Hao Zhou,Yongpeng Ning,Fan Yang,Tao Li
DOI: https://doi.org/10.1016/j.eswa.2014.12.028
2015-01-01
Abstract:A framework for novelty detection using random forest is proposed.Two specific approaches using the vote distribution and the proximity matrix are presented.A comprehensive empirical study on both synthetic and real-world datasets is conducted. In many online classification tasks or non-exhaustive learning, it is often impossible to define a training set with a complete set of classes. The presence of new classes as well as the novelties caused by data errors can severely affect the performance of classifiers. Traditional proximity-based approaches usually utilize the distance to measure the proximity of different samples. In this study, we propose a framework that uses ensemble learning to detect novelty based on Random Forest (RF). The proposed framework is based on the observation that an ensemble of classifiers can provide a kind of metric to characterize different classes and measure their proximity. In particular, we apply ensemble methods with the decision tree as base classifiers and present two specific approaches, RFV and RFP, based on random forest. RFV uses the vote distribution of RF on a testing sample, and RFP takes the proximity matrix of RF as a special kernel metric to discover the novelty. The proposed approaches are compared against two common approaches: support vector domain description (SVDD) and Gaussian Mixed Model (GMM) on one artificial data set and five benchmark data sets. The experimental results show that the proposed methods achieve better performance in terms of accuracy and recall.
What problem does this paper attempt to address?