Bi-Directional Recurrent Attentional Topic Model

Shuangyin Li,Yu Zhang,Rong Pan
DOI: https://doi.org/10.1145/3412371
IF: 4.157
2020-12-31
ACM Transactions on Knowledge Discovery from Data
Abstract:In a document, the topic distribution of a sentence depends on both the topics of its neighbored sentences and its own content, and it is usually affected by the topics of the neighbored sentences with different weights. The neighbored sentences of a sentence include the preceding sentences and the subsequent sentences. Meanwhile, it is natural that a document can be treated as a sequence of sentences. Most existing works for Bayesian document modeling do not take these points into consideration. To fill this gap, we propose a bi-Directional Recurrent Attentional Topic Model (bi-RATM) for document embedding. The bi-RATM not only takes advantage of the sequential orders among sentences but also uses the attention mechanism to model the relations among successive sentences. To support to the bi-RATM, we propose a bi-Directional Recurrent Attentional Bayesian Process (bi-RABP) to handle the sequences. Based on the bi-RABP, bi-RATM fully utilizes the bi-directional sequential information of the sentences in a document. Online bi-RATM is proposed to handle large-scale corpus. Experiments on two corpora show that the proposed model outperforms state-of-the-art methods on document modeling and classification.
computer science, information systems, software engineering
What problem does this paper attempt to address?