Decoding Matters: Addressing Amplification Bias and Homogeneity Issue for LLM-based Recommendation

Keqin Bao,Jizhi Zhang,Yang Zhang,Xinyue Huo,Chong Chen,Fuli Feng
2024-09-28
Abstract:Adapting Large Language Models (LLMs) for recommendation requires careful consideration of the decoding process, given the inherent differences between generating items and natural language. Existing approaches often directly apply LLMs' original decoding methods. However, we find these methods encounter significant challenges: 1) amplification bias -- where standard length normalization inflates scores for items containing tokens with generation probabilities close to 1 (termed ghost tokens), and 2) homogeneity issue -- generating multiple similar or repetitive items for a user. To tackle these challenges, we introduce a new decoding approach named Debiasing-Diversifying Decoding (D3). D3 disables length normalization for ghost tokens to alleviate amplification bias, and it incorporates a text-free assistant model to encourage tokens less frequently generated by LLMs for counteracting recommendation homogeneity. Extensive experiments on real-world datasets demonstrate the method's effectiveness in enhancing accuracy and diversity.
Information Retrieval
What problem does this paper attempt to address?
This paper attempts to solve two main problems encountered in the decoding process when applying large - language models (LLMs) to recommendation systems: **Amplification Bias** and **Homogeneity Issue**. ### Amplification Bias 1. **Problem Description**: - In the process of generating recommendation items, some items may contain tokens with a generation probability close to 1 (referred to as "ghost tokens"). Existing decoding methods tend to enhance the scores of these items through length normalization. However, when ghost tokens appear, their probability product does not significantly reduce the final score, but length normalization is still applied, resulting in score amplification. 2. **Formula Explanation**: - The probability formula for the generated sequence is: \[ p(y|x)=\prod_{i = 1}^{m}p(y_i|x,y_{<i}) \] - The formula for length normalization is: \[ S(h)=\frac{S(h)}{L^{\alpha}} \] where \(L\) is the length of the generated sequence, and \(\alpha\) is a hyper - parameter that controls the length penalty. 3. **Impact**: - This bias will cause the recommendation results to be biased towards those items containing high - probability tokens, thus affecting the accuracy and diversity of the recommendations. ### Homogeneity Issue 1. **Problem Description**: - When using the original decoding method, LLMs tend to generate multiple recommendation items with similar structures and contents, especially when providing multiple recommendation items to users. For example, the model may recommend multiple products in the same series or category (such as "PlayStation 3" and "PlayStation 4"). In addition, the model will also repeat the recommendation of the same features according to past user interactions. 2. **Cause Analysis**: - This phenomenon is mainly due to the fact that text - similar sequences usually obtain similar scores, and LLMs inherit the match - and - copy mechanism, which makes the recommendation results lack diversity. ### Solution To address the above problems, the authors propose a new decoding method, called **Debiasing - Diversifying Decoding (D3)**: 1. **Alleviating Amplification Bias**: - D3 selectively applies length normalization, excluding the influence of ghost tokens. Specifically, D3 does not perform length normalization on ghost tokens, thereby avoiding score amplification. 2. **Solving the Homogeneity Issue**: - D3 introduces the score of a text - free assistant model in each decoding step to guide token generation. This text - free model is not affected by repeated texts and can provide meaningful and non - repetitive token suggestions, thereby improving the diversity of recommendations. ### Experimental Verification Through experiments on six real - world datasets, the D3 method performs excellently in recommendation accuracy (Hit Ratio and NDCG), especially surpassing the baseline methods in terms of diversity and accuracy. In conclusion, this paper significantly improves the performance of LLM - based recommendation systems by identifying and solving the amplification bias and homogeneity issue in the decoding process.