Reducing BERT Computation by Padding Removal and Curriculum Learning

Wei Zhang,Wei,Wen Wang,Lingling Jin,Zheng Cao
DOI: https://doi.org/10.1109/ispass51385.2021.00025
2021-01-01
Abstract:BERT [1] is very computationally expensive, which is a hurdle for its training and deployment. This work focuses on removing the unnecessary computation due to input padding in BERT. The input of BERT consists of two concatenated sentences. If the length of the two concatenated sentences is shorter than the maximum sequence length, padding must be added to the end of the sentences to fill the empt...
What problem does this paper attempt to address?