Dropout Token To Improve Neural Language Model

Peng Jin,Lingjiao Xu,Bing Wang,Xingyuan Chen
DOI: https://doi.org/10.1109/CIS54983.2021.00027
2021-11-01
Abstract:Dropout, as an effective avoid over-fitting method for training a neural network, is widely used in both computer vision and natural language progressing. The typical approach is dropping out hidden and visible units. Specifically, the neural language model usually applies dropout for hidden units. However, few research applies dropout for input. In this study, we employ dropout on input token sequence. This is similar to mask but the critical difference is the masked tokens will not be predicted at all. Two benchmark dataset, EMNLP2017 WMT News and Penn Tree Bank, are experimented with. The experimental results show that our method outperforms baseline model significantly.
Computer Science
What problem does this paper attempt to address?