Abstract:Deep probabilistic aspect models are widely utilized in document analysis to extract the semantic information and obtain descriptive topics. However, there are two problems that may affect their applications. One is that common words shared among all documents with low representational meaning may reduce the representation ability of learned topics. The other is introducing supervision information to hierarchical topic models to fully utilize the side information of documents that is difficult. To address these problems, in this article, we first propose deep diverse latent Dirichlet allocation (DDLDA), a deep hierarchical topic model that can yield more meaningful semantic topics with less common and meaningless words by introducing shared topics. Moreover, we develop a variational inference network for DDLDA, which helps us to further generalize DDLDA to a supervised deep topic model called max-margin DDLDA (mmDDLDA) by employing max-margin principle as the classification criterion. Compared to DDLDA, mmDDLDA can discover more discriminative topical representations. In addition, a continual hybrid method with stochastic-gradient MCMC and variational inference is put forward for deep latent Dirichlet allocation (DLDA)-based models to make them more practical in real-world applications. The experimental results demonstrate that DDLDA and mmDDLDA are more efficient than existing unsupervised and supervised topic models in discovering highly discriminative topic representations and achieving higher classification accuracy. Meanwhile, DLDA and our proposed models trained by the proposed continual learning approach cannot only show good performance on preventing catastrophic forgetting but also fit the evolving new tasks well.

Interpretative Topic Categorization Via Deep Multiple Instance Learning

Hierarchical and Bidirectional Joint Multi-Task Classifiers for Natural Language Understanding

Multiple-instance Learning for Text Categorization Based on Semantic Representation

Investigating Siamese LSTM networks for text categorization

Dimensionality Reduction With Category Information Fusion And Non-Negative Matrix Factorization For Text Categorization

Collaborative Work with Linear Classifier and Extreme Learning Machine for Fast Text Categorization

A Deep Multi-Task Representation Learning Method for Time Series Classification and Retrieval.

Latent Topic Text Representation Learning on Statistical Manifolds.

Max-Margin Deep Diverse Latent Dirichlet Allocation With Continual Learning

A Deep and Autoregressive Approach for Topic Modeling of Multimodal Data

Discriminative Topic Mining via Category-Name Guided Text Embedding

An Efficient Framework by Topic Model for Multi-label Text Classification

Muli-label Text Categorization with Hidden Components.

Sparse Multiple Instance Learning As Document Classification.

Effective Collaborative Representation Learning for Multilabel Text Categorization

Hierarchical Inter-Attention Network for Document Classification with Multi-Task Learning.

A Joint Model Of Extended Lda And Ibtm Over Streaming Chinese Short Texts

Amplifying document categorization with advanced features and deep learning

Deep Autoencoding Topic Model With Scalable Hybrid Bayesian Inference

Image Annotation by Multiple-Instance Learning with Discriminative Feature Mapping and Selection

Supervised latent semantic indexing for document categorization