Abstract:Recently, deep learning and deep neural networks have attracted considerable attention and emerged as one predominant field of research in the artificial intelligence community. The developed techniques have also gained widespread use in various domains with good success, such as automatic speech recognition, information retrieval and text classification, etc. Among them, long short-term memory (LSTM) networks are well suited to such tasks, which can capture long-range dependencies among words efficiently, meanwhile alleviating the gradient vanishing or exploding problem during training effectively. Following this line of research, in this paper we explore a novel use of a Siamese LSTM based method to learn more accurate document representation for text categorization. Such a network architecture takes a pair of documents with variable lengths as the input and utilizes pairwise learning to generate distributed representations of documents that can more precisely render the semantic distance between any pair of documents. In doing so, documents associated with the same semantic or topic label could be mapped to similar representations having a relatively higher semantic similarity. Experiments conducted on two benchmark text categorization tasks, viz. IMDB and 20Newsgroups, show that using a three-layer deep neural network based classifier that takes a document representation learned from the Siamese LSTM sub-networks as the input can achieve competitive performance in relation to several state-of-the-art methods.

Multiple-instance Learning for Text Categorization Based on Semantic Representation

Semantic Image Retrieval Based on Multiple-Instance Learning

Text Categorization Based on Domain Ontology

Supervised Explicit Semantic Representation for Text Categorization

Fast text categorization based on collaborative work in the semantic and class spaces

Learning Semantic Similarity For Multi-Label Text Categorization

Non-Negative Sparse Semantic Coding for Text Categorization

A multiclass classification framework for document categorization

Dimensionality Reduction With Category Information Fusion And Non-Negative Matrix Factorization For Text Categorization

Exploiting Textual and Visual Features for Image Categorization

Aggressive Dimensionality Reduction With Reinforcement Local Feature Selection For Text Categorization

Collaborative Work with Linear Classifier and Extreme Learning Machine for Fast Text Categorization

Investigating Siamese LSTM networks for text categorization

LSASGT:an Approach to Text Categorization Based on Latent Semantic Analysis and Spectral Graph Transducer

Semantic Hilbert Space for Text Representation Learning

Semantic representation in text classification using topic signature mapping

Learning Effective Features for Chinese Text Categorization

Effective Collaborative Representation Learning for Multilabel Text Categorization

Multi-label text classification based on semantic-sensitive graph convolutional network

Multi-label Text Categorization with Joint Learning Predictions-as-Features Method

Combining Latent Semantic Learning and Reduced Hypergraph Learning for Semi-Supervised Image Categorization