Abstract:Document classification has become an indispensable technology to realize intelligent information services. This technique is often applied to the tasks such as document organization, analysis, and archiving or implemented as a submodule to support high-level applications. It has been shown that semantic analysis can improve the performance of document classification. Although this has been incorporated in previous automatic document classification methods, with an increase in the number of documents stored online, the use of semantic information for document classification has attracted greater attention as it can greatly reduce human effort. In this present paper, we propose two semantic document classification strategies for two types of semantic problems: (1) a novel semantic similarity computation (SSC) method to solve the polysemy problem and (2) a strong correlation analysis method (SCM) to solve the synonym problem. Experimental results indicate that compared with traditional machine learning, n-gram, and contextualized word embedding methods, the efficient semantic similarity and correlation analysis allow eliminating word ambiguity and extracting useful features to improve the accuracy of semantic document classification for texts in Chinese. (C) 2020 Elsevier B.V. All rights reserved.

Short Documents Classification Method in Very Large Text Database

Massive Short Documents Classification Method Based on Frequent Term Set Clustering

Short documents clustering in very large text databases

Database Systems for Advanced Applications

Study on Massive Short Documents Clustering Technology

Improving short text classification using public search engines

Convolutional Long Short-term Memory for Long Length Document Classification

Extremely Short Chinese Text Classification Method Based on Bidirectional Semantic Extension

Towards Effective Short Text Deep Classification.

Short Text Classification Based on Strong Feature Thesaurus

Research and application of a method for real estate document image classification

Algorithm for Chinese Short-Text Classification Using Concept Description

Comparisons and Selections of Features and Classifiers for Short Text Classification

Chinese semantic document classification based on strategies of semantic similarity computation and correlation analysis

Chinese Documents Categorization Based on N-gram Information

Short text classification based on LDA topic model

Hierarchical Multiple Granularity Attention Network for Long Document Classification

Chinese Documents Classification Based on N-Grams

Classifying Extremely Short Texts by Exploiting Semantic Centroids in Word Mover's Distance Space

Combining Lexical and Semantic Features for Short Text Classification.

Design and implementation of an ontology algorithm for web documents classification