CogLTX: Applying BERT to Long Texts.

Ming Ding,Chang Zhou,Hongxia Yang,Jie Tang
2020-01-01
Abstract:BERT is incapable of processing long texts due to its quadratically increasing memory and time consumption. The most natural ways to address this problem, such as slicing the text by a sliding window or simplifying transformers, suffer from insufficient long-range attentions or need customized CUDA kernels. The maximum length limit in BERT reminds us the limited capacity (5 ~ 9 chunks) of the working memory of humans -— then how do human beings Cognize Long TeXts? Founded on the cognitive theory stemming from Baddeley [2], the proposed CogLTX framework identifies key sentences by training a judge model, concatenates them for reasoning, and enables multi-step reasoning via rehearsal and decay . Since relevance annotations are usually unavailable, we propose to use interventions to create supervision. As a general algorithm, CogLTX outperforms or gets comparable results to SOTA models on various downstream tasks with memory overheads independent of the length of text.
What problem does this paper attempt to address?