The Automatic Text Classification Method Based on BERT and Feature Union

Wenting Li,Shangbing Gao,Hong Zhou,Zihe Huang,Kewen Zhang,Wei Li
DOI: https://doi.org/10.1109/icpads47876.2019.00114
2019-12-01
Abstract:For the traditional model based on the deep learning method most used CNN (convolutional neural networks) or RNN(Recurrent neural Network) model and is based on the dynamic character-level embedding or word-level embedding as input, so there is a problem that the text feature extraction is not comprehensive. In the development environment of the Internet of Things, A method of Automatic text classification based on BERT(Bidirectional Encoder Representations from Transformers) and Feature Fusion was proposed in this paper. Firstly, the text-to-dynamic character-level embedding is transformed by the BERT model, and the BiLSTM(Bi-directional Long-Short Term Memory) and CNN output features are combined and merged to make full use of CNN to extract the advantages of local features and to use BiLSTM to have the advantage of memory to link the extracted context features to better represent the text, so as to improve the accuracy of text classification task. A comparative study with state-of-the-art approaches manifests the proposed method outperforms the state-of-the-art methods in accuracy. It can effectively improve the accuracy of tag prediction for text data with sequence features and obvious local features.
What problem does this paper attempt to address?