Training Large-Scale News Recommenders with Pretrained Language Models in the Loop

Shitao Xiao,Zheng Liu,Yingxia Shao,Tao Di,Bhuvan Middha,Fangzhao Wu,Xing Xie
DOI: https://doi.org/10.1145/3534678.3539120
2022-01-01
Abstract:News recommendation calls for deep insights of news articles' underlying semantics. Therefore, pretrained language models (PLMs), like BERT and RoBERTa, may substantially contribute to the recommendation quality. However, it's extremely challenging to have news recommenders trained together with such big models: the learning of news recommenders requires intensive news encoding operations, whose cost is prohibitive if PLMs are used as the news encoder. In this paper, we propose a novel framework, SpeedyFeed, which efficiently trains PLMs-based news recommenders of superior quality. SpeedyFeed is highlighted for its light-weight encoding pipeline, which gives rise to three major advantages. Firstly, it makes the intermediate results fully reusable for the training workflow, which removes most of the repetitive but redundant encoding operations. Secondly, it improves the data efficiency of the training workflow, where non-informative data can be eliminated from encoding. Thirdly, it further saves the cost by leveraging simplified news encoding and compact news representation. SpeedyFeed leads to more than 100x acceleration of the training process, which enables big models to be trained efficiently and effectively over massive user data. The well-trained PLMs-based model significantly outperforms the state-of-the-art news recommenders in comprehensive offline experiments. It is applied to Microsoft News to empower the training of large-scale production models, which demonstrate highly competitive online performances. SpeedyFeed is also a model-agnostic framework, thus being potentially applicable to a wide spectrum of content-based recommender systems. We've made the source code open to the public so as to facilitate research and applications in related areas.
What problem does this paper attempt to address?