Exploring Large Language Models Text Style Transfer Capabilities

Weijie Li,Zhentao Gu,Xiaochao Fan,Wenjun Deng,Yong Yang,Xinyuan Zhao,Yufeng Diao,Liang Yang
DOI: https://doi.org/10.3233/faia240865
2024-01-01
Abstract:The emergence of Large Language Models (LLMs) provides a new solution to text generation tasks that involve high complexity, such as text style transfer (TST) tasks. However, previous studies have not fully explored the TST capabilities of different LLMs, and have faced issues with a lack of uniform standards in the human evaluation stage. This makes the results of human evaluation difficult to reproduce and less credible. To address this, this paper designs a prompt template to guide the cutting-edge LLMs to perform effective text style transfer and carries out an in-depth comparative analysis of various small-scale language models. In the stage of human evaluation, this paper eschews the conventional rating system, opting instead for a comparative human assessment methodology, which we refer to as duel-ranking. This method determines the relative ranking of models through mutual comparison, serving as an alternative to direct scoring. Detailed evaluation instructions are provided herein, to enhance the reproducibility of this method and ensure consistency throughout the evaluation process. This manual evaluation process reveals that GPT-3.5 and GPT-4 exhibit excellent performance in the TST tasks.
What problem does this paper attempt to address?