Machine learning based feature selection and knowledge reasoning for CBR system under big data

Yuan Guo,Bing Zhang,Y. Sun,K. Jiang,K. Wu
DOI: https://doi.org/10.1016/j.patcog.2020.107805
IF: 8
2021-04-01
Pattern Recognition
Abstract:<p>Under big data, large number of features as well as their complex data types makes traditional feature selection and knowledge reasoning in CBR system not adapt to new condition. To solve these problems, first, this paper proposes Weighted Relative Probability Change of Solution Parameters (WRPCSP) algorithm to execute feature selection. Then, this paper integrates Bayesian network (BN) with CBR system for knowledge reasoning. Based on probability calculation and reasoning, WRPCSP algorithm together with BN allows the proposed CBR system to well work under big data. In addition, to overcome the efficiency problem caused by large number of features, this paper also proposes Group-Outside (GO) algorithm to assign the computing task of big data for parallel data processing. GO algorithm can make the computing capacity of Hadoop fully utilized to gain the least time costing for parallel data processing. Finally, lots of experiments are performed to validate the proposed method.</p>
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?
This paper attempts to address the challenges posed by complex feature types and a large number of features to Case-Based Reasoning (CBR) systems in a big data environment. Specifically: 1. **Feature Selection Problem**: Traditional feature selection methods struggle to adapt to complex data types in a big data environment, necessitating a new machine learning approach for effective feature selection. 2. **Knowledge Reasoning Efficiency Problem**: To enhance the applicability of CBR systems in a big data environment, Bayesian Networks (BN) are introduced for knowledge reasoning, and a new Group-Outside (GO) algorithm is proposed to allocate computational tasks, thereby achieving parallel data processing. By proposing the Weighted Relative Probability Change of Solution Parameters (WRPCSP) algorithm and the Group-Outside (GO) algorithm, the paper aims to improve the feature selection accuracy and knowledge reasoning efficiency of CBR systems, enabling them to better adapt to a big data environment.