Evaluation of Feature Selection Using Wrapper For Numeric Dataset With Random Forest Algorithm

Arie Nugroho,Ahmad Zainul Fanani,Guruh Fajar Shidik
DOI: https://doi.org/10.1109/isemantic52711.2021.9573249
2021-09-18
Abstract:Preprocessing is more than half of machine learning process. Dimensionality reduction is one of the preprocessing task, which included feature extraction and selection. Feature selection used for identify relevant and remove not relevant feature. The goal of this research is to select relevant feature using wrapper method for early diabetes prediction dataset which has been transformed to numeric dataset previously. Forward and backward selection are used in wrapper method, that's combine with random forest and cross validation. Random forest is decision tree enhancement, which is group of trees that can produce difference or same result at each tree. The most results are made as final result. The final result from feature selection with wrapper method can make higher accuracy than without feature selection for numeric dataset and the number of feature can be reduced. With features selection which is sequential forward selection it has 98.84 % accuracy with 11 feature selected and with sequential backward selection, it has 99.03 % accuracy with same number of features selected. With reduced features, will reduces complexity of trees and time required in mining process.
What problem does this paper attempt to address?