Abstract:Short text classification (STC) has attracted increasing interest recently with the rapid growth of Web and social media data existing in short text form. It is a more challenging task than traditional text classification (TC) because of the feature sparsity of the processed short texts, which makes the state of the art TC approaches perform poorly on short texts if being applied straightforwardly. Existing STC approaches deal with the sparse problem mainly by enriching text content with outer corpora or additional information. Though better performance can be obtained, the performance heavily relies on the amount and quality of outer or additional information. What is worse, such outer or additional information is not always available, not to mention the high cost for acquiring such information. In this paper, we introduce a structured sparse representation classifier to effectively classify short texts, and develop an effective approach called convex hull vertices selection to reduce data correlation and redundancy of the dictionary (the set of training texts), which thus substantially boosts STC efficiency and performance. To the best of our knowledge, this is the first work that exploits structured sparsity for STC. Experiments over five datasets show that the proposed approach outperforms the state of the art TC methods in classification effectiveness and the traditional SR classifier in both classification effectiveness and classification efficiency. Furthermore, we carry out an experiment to classify short texts expanded by additional content, which indirectly shows that our approach performs better than the existing SIC methods that exploit external text sources. (C) 2015 Elsevier Inc. All rights reserved.

Short Text Classification Based on Strong Feature Thesaurus

Improving Short Text Classification Through Better Feature Space Selection

Short Text Model Based on Strong Feature Thesaurus

Improving short text classification using public search engines

Research on Deep Web Classification Based on Domain Feature Text

A multiclass classification framework for document categorization

Feature Selection Method on Imbalanced Text

Combining Lexical and Semantic Features for Short Text Classification.

Extremely Short Chinese Text Classification Method Based on Bidirectional Semantic Extension

Short text classification by detecting information path.

Short Text Classification Improved by Feature Space Extension

Short Text Classification Model based on Pre-trained Language Model with Feature Fusion

Short text classification based on bidirectional TCN and attention mechanism

Short-Text Classification Detector: A Bert-Based Mental Approach

Short Text Classification via Term Graph

Research of Chinese Text Classification Methods Based on Semantic Vector and Semantic Similarity

Effectively Classifying Short Texts by Structured Sparse Representation with Dictionary Filtering

A Deep Learning Short Text Classification Model Integrating Part of Speech Features

Improving Text Classification Using Local Latent Semantic Indexing

Semantic similarity-aware feature selection and redundancy removal for text classification using joint mutual information

A multi-semantic passing framework for semi-supervised long text classification