Investigating Recurrent Transformers with Dynamic Halt

Jishnu Ray Chowdhury,Cornelia Caragea
2024-09-03
Abstract:In this paper, we comprehensively study the inductive biases of two major approaches to augmenting Transformers with a recurrent mechanism: (1) the approach of incorporating a depth-wise recurrence similar to Universal Transformers; and (2) the approach of incorporating a chunk-wise temporal recurrence like Temporal Latent Bottleneck. Furthermore, we propose and investigate novel ways to extend and combine the above methods - for example, we propose a global mean-based dynamic halting mechanism for Universal Transformers and an augmentation of Temporal Latent Bottleneck with elements from Universal Transformer. We compare the models and probe their inductive biases in several diagnostic tasks, such as Long Range Arena (LRA), flip-flop language modeling, ListOps, and Logical Inference. The code is released in: <a class="link-external link-https" href="https://github.com/JRC1995/InvestigatingRecurrentTransformers/tree/main" rel="external noopener nofollow">this https URL</a>
Machine Learning,Artificial Intelligence,Neural and Evolutionary Computing
What problem does this paper attempt to address?