Two-stage three-way enhanced technique for ensemble learning in inclusive policy text classification

Decui Liang,Bochun Yi
DOI: https://doi.org/10.1016/j.ins.2020.08.051
IF: 8.1
2021-01-01
Information Sciences
Abstract:With the development of the social economy, small and medium-sized enterprises (SMEs) play a vital role in promoting economic development. Multiple local governments in China are developing policy recommended platforms in order to help SMEs better understand the inclusive policy. However, these online platforms manually extract the key information from the inclusive policy texts, which takes a lot of time and causes low efficiency. The policy text is composed of some paragraphs and each paragraph corresponds to a topic. When we classify the paragraphs into different topics, there exists a decision risk of text misclassification. Therefore, we design two-stage based three-way enhanced technique to automatically classify these text paragraphs into the predefined categories. At the first stage, by using ensemble learning algorithms, we construct an ensemble convolution neural network (CNN) model in order to ensure the generalization ability and stability of text classification results. Meanwhile, we develop a new weight determination method to integrate the prediction results of all base classifiers according to the accuracy and classification confidence. With the help of three-way decisions (3WD), we assign the samples with poor resolution to the boundary area for secondary classification, which can reduce the decision risk. At the second stage, in order to classify the boundary region samples and improve the overall classification results, we further utilize traditional machine learning method as the secondary classifier. Finally, we develop some comparison experiments to verify our proposed method. The experimental results show that the two-stage three-way enhanced classification framework is valid and obtains a better performance. Our proposed method can effectively support the designment of policy recommended platforms and serve SMEs.
What problem does this paper attempt to address?