On the Transformations across Reward Model, Parameter Update, and In-Context Prompt

Deng Cai,Huayang Li,Tingchen Fu,Siheng Li,Weiwen Xu,Shuaiyi Li,Bowen Cao,Zhisong Zhang,Xinting Huang,Leyang Cui,Yan Wang,Lemao Liu,Taro Watanabe,Shuming Shi
2024-06-24
Abstract:Despite the general capabilities of pre-trained large language models (LLMs), they still need further adaptation to better serve practical applications. In this paper, we demonstrate the interchangeability of three popular and distinct adaptation tools: parameter updating, reward modeling, and in-context prompting. This interchangeability establishes a triangular framework with six transformation directions, each of which facilitates a variety of applications. Our work offers a holistic view that unifies numerous existing studies and suggests potential research directions. We envision our work as a useful roadmap for future research on LLMs.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?