Abstract:Recently, deep learning and deep neural networks have attracted considerable attention and emerged as one predominant field of research in the artificial intelligence community. The developed techniques have also gained widespread use in various domains with good success, such as automatic speech recognition, information retrieval and text classification, etc. Among them, long short-term memory (LSTM) networks are well suited to such tasks, which can capture long-range dependencies among words efficiently, meanwhile alleviating the gradient vanishing or exploding problem during training effectively. Following this line of research, in this paper we explore a novel use of a Siamese LSTM based method to learn more accurate document representation for text categorization. Such a network architecture takes a pair of documents with variable lengths as the input and utilizes pairwise learning to generate distributed representations of documents that can more precisely render the semantic distance between any pair of documents. In doing so, documents associated with the same semantic or topic label could be mapped to similar representations having a relatively higher semantic similarity. Experiments conducted on two benchmark text categorization tasks, viz. IMDB and 20Newsgroups, show that using a three-layer deep neural network based classifier that takes a document representation learned from the Siamese LSTM sub-networks as the input can achieve competitive performance in relation to several state-of-the-art methods.

An Efficient Method for Document Categorization Based on Word2vec and Latent Semantic Analysis.

Document Clustering Using Locality Preserving Indexing

Knowledge-based Document Embedding for Cross-Domain Text Classification

Supervised latent semantic indexing for document categorization

A multiclass classification framework for document categorization

LSASGT:an Approach to Text Categorization Based on Latent Semantic Analysis and Spectral Graph Transducer

Fast text categorization based on collaborative work in the semantic and class spaces

A VECTOR SPACE MODEL BASED DOCUMENT CLASSIFICATION SYSTEM [J]

Document Classification Based on Word Vectors

Non-Negative Sparse Semantic Coding for Text Categorization

Collaborative Work with Linear Classifier and Extreme Learning Machine for Fast Text Categorization

Tensor Space Model for Document Analysis

Chinese Document Categorization without Dictionary Support and Segmentation Processing

Multiple-instance Learning for Text Categorization Based on Semantic Representation

Investigating Siamese LSTM networks for text categorization

Dimensionality Reduction With Category Information Fusion And Non-Negative Matrix Factorization For Text Categorization

Improving Document Classification with Multi-Sense Embeddings

Document classification with distributions of word vectors

Document Classification with Spherical Word Vectors

Generating Different Semantic Spaces For Document Classification

Local Word Bag Model for Text Categorization