Feature Selection for Classification using Principal Component Analysis and Information Gain

Erick Odhiambo Omuya,George Onyango Okeyo,Michael Waema Kimwele
DOI: https://doi.org/10.1016/j.eswa.2021.114765
IF: 8.5
2021-07-01
Expert Systems with Applications
Abstract:<p>Feature Selection and classification have previously been widely applied in various fields like business, medical and media fields. High dimensionality in datasets is one of the main challenges that has been experienced in classifying data, data mining and sentiment analysis. Irrelevant and redundant attributes have also had a negative impact on complexity and operation of algorithms for classifying data. Consequently, the algorithms record poor results or performance. Some existing work use all attributes for classification, some of which are insignificant for the task, thereby leading to poor performance. This paper therefore develops a hybrid filter model for feature selection based on principal component analysis and information gain. The hybrid model is then applied to support classification machine learning techniques e.g. the Naïve Bayes technique. Experimental results demonstrate that the hybrid filter model reduces data dimensions, selects appropriate feature sets, and reduces training time, hence providing better classification performance as measured by accuracy, precision and recalls.</p>
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science
What problem does this paper attempt to address?