Abstract:Short text classification (STC) has attracted increasing interest recently with the rapid growth of Web and social media data existing in short text form. It is a more challenging task than traditional text classification (TC) because of the feature sparsity of the processed short texts, which makes the state of the art TC approaches perform poorly on short texts if being applied straightforwardly. Existing STC approaches deal with the sparse problem mainly by enriching text content with outer corpora or additional information. Though better performance can be obtained, the performance heavily relies on the amount and quality of outer or additional information. What is worse, such outer or additional information is not always available, not to mention the high cost for acquiring such information. In this paper, we introduce a structured sparse representation classifier to effectively classify short texts, and develop an effective approach called convex hull vertices selection to reduce data correlation and redundancy of the dictionary (the set of training texts), which thus substantially boosts STC efficiency and performance. To the best of our knowledge, this is the first work that exploits structured sparsity for STC. Experiments over five datasets show that the proposed approach outperforms the state of the art TC methods in classification effectiveness and the traditional SR classifier in both classification effectiveness and classification efficiency. Furthermore, we carry out an experiment to classify short texts expanded by additional content, which indirectly shows that our approach performs better than the existing SIC methods that exploit external text sources. (C) 2015 Elsevier Inc. All rights reserved.

Filtering and Classifying Relevant Short Text with a Few Seed Words

CLDA: Feature Selection for Text Categorization Based on Constrained LDA

Improving Short Text Classification Through Better Feature Space Selection

Improving short text classification using public search engines

Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding

Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning

Seed Word Selection for Weakly-Supervised Text Classification with Unsupervised Error Estimation

TSSE-DMM: Topic Modeling for Short Texts Based on Topic Subdivision and Semantic Enhancement

Non-Negative Sparse Semantic Coding for Text Categorization

Short Text Entity Linking With Fine-Grained Topics

Short Text Classification Based on Strong Feature Thesaurus

Related Text Discovery Through Consecutive Filtering and Supervised Learning.

Topic model based on co-occurrence word networks for unbalanced short text datasets

Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts

Constructing Pseudo Documents With Semantic Similarity For Short Text Topic Discovery

A High Performance Two-Class Chinese Text Categorization Method

Semantic similarity-aware feature selection and redundancy removal for text classification using joint mutual information

Effectively Classifying Short Texts by Structured Sparse Representation with Dictionary Filtering

BTM: Topic Modeling over Short Texts

Bayesian Text Classification and Summarization Via A Class-Specified Topic Model.

Supervised cross-collection topic modeling.