Counting in Language with RNNs

Heng xin Fun,Sergiy V Bokhnyak,Francesco Saverio Zuppichini
DOI: https://doi.org/10.48550/arXiv.1810.12411
2018-10-31
Abstract:In this paper we examine a possible reason for the LSTM outperforming the GRU on language modeling and more specifically machine translation. We hypothesize that this has to do with counting. This is a consistent theme across the literature of long term dependence, counting, and language modeling for RNNs. Using the simplified forms of language -- Context-Free and Context-Sensitive Languages -- we show how exactly the LSTM performs its counting based on their cell states during inference and why the GRU cannot perform as well.
Machine Learning,Neural and Evolutionary Computing
What problem does this paper attempt to address?