RANDOM FOREST AND SUPPORT VECTOR MACHINE ON FEATURES SELECTION FOR REGRESSION ANALYSIS

Rung-Ching Chen,Christine Dewi
Abstract:. Feature selection becomes predominant and quite prominent in the case of datasets that are contained with a higher number of variables. RF (Random Forest) has emerged as a robust algorithm that can handle a feature selection problem with a higher number of variables. It is also very much efficient while dealing with regression problems. In this work, we proposed the combination of RF, SVM (Support Vector Machine) and tune SVM regression to improve the model performance. We use four outstanding regression datasets from the UCI (University of California Irvine) machine learning repository. In addition, the ranking of important features by RF for affection factors is given out. We prove that it is essential to select the best features to improve the performance of the model. The experimental results show that our proposed model has a better effect compared to other methods in each dataset. The trend of RMSE (Root Mean Squared Error) value is decreased, and the r-value is increased in every experiment for all datasets. Furthermore, it is indicated that the regression predictions perfectly fit the data.
Computer Science
What problem does this paper attempt to address?