Hierarchical Topic-Aware Contextualized Transformers
Ruiying Lu,Bo Chen,Dandan Guo,Dongsheng Wang,Mingyuan Zhou
DOI: https://doi.org/10.1109/taslp.2023.3339344
2023-01-01
IEEE/ACM Transactions on Audio Speech and Language Processing
Abstract:Training on disjoint fixed-length segments, Transformers convert static word embeddings into contextualized word representations. However, they often restrict the context of a token to the segment it resides in and hence neglect the contextual information across segments, failing to capture longer-term dependencies beyond the predefined segment length. This article uses a probabilistic deep topic model to provide hierarchical contextualized embeddings at both the token and segment levels, and integrate topic information through a constrained attention mechanism. The proposed method not only injects contextualized topic information into Transformers, but also controls languages generation guided by specific topics, styles, and sentiments. Three plug-and-play modules are proposed, including the contextual topical token embedding, the segment embedding, and the multi-head topic attention mechanism. We aim to capture the semantic coherence and word concurrence patterns at the global level, and also enrich the representation of each token by adapting to its local context, with negligible increased memory footprint and computational time. Experiments on various corpora show that by adding marginal extra parameters, the proposed hierarchical topic-aware contextualized Transformers consistently outperform their conventional counterparts, and generate sentences and paragraphs according to human preferences.