Abstract:Cross-domain learning text classification aims to train an accurate model for a target domain by using labeled text data from a source domain with different but related data distributions. To narrow the data distribution gap between different domains, most of the previous approaches utilize the bag-of-words model to obtain latent features representation of the text. However, this kind of model loses the word order information and misses the background knowledge of the text. As the result, the conceptual information of the text is ignored to a big extent. In this paper, we propose a novel framework named Document Concept Vector for the cross-domain text classification which leverages both the neural network and the knowledge base in order to produce a high quality representation of the text. Specifically, a raw document is first transformed into a conceptualized document which consists of a set of concepts by utilizing a large taxonomy knowledge base. After that, the conceptualized document is transformed into a document vector through the neural network and the vector is used as the concept level feature of the original document. Finally, we conducted the experiments on two real-world corpora and compared it with both traditional classification algorithms and several state-of-the-art approaches of cross-domain text classification. The results validate the effectiveness of our framework.

Document Classification Method Based on Word2vec

Knowledge-based Document Embedding for Cross-Domain Text Classification

Document Classification Based on Word Vectors

Document classification with distributions of word vectors

A VECTOR SPACE MODEL BASED DOCUMENT CLASSIFICATION SYSTEM [J]

A multiclass classification framework for document categorization

Document Classification with Spherical Word Vectors

Research of Chinese Text Classification Methods Based on Semantic Vector and Semantic Similarity

Similarity Analysis of Law Documents Based on Word2vec

Generating Different Semantic Spaces For Document Classification

Convolutional Long Short-term Memory for Long Length Document Classification

WordNet-based Concept Vector Space Model for Text Classification

A document classification approach by GA feature extraction based corner classification neural network

Chinese Sentiment Classification Using A Neural Network Tool-Word2vec

A Novel Approach to Document Classification using WordNet

Word Vector Enrichment of Low Frequency Words in the Bag-of-Words Model for Short Text Multi-class Classification Problems

Semantic representation in text classification using topic signature mapping

Bag-of-Embeddings for Text Classification.

Text Classification With Document Embeddings

Improving Document Classification with Multi-Sense Embeddings

Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec