Aligning GPTRec with Beyond-Accuracy Goals with Reinforcement Learning
Aleksandr Petrov,Craig Macdonald
2024-03-08
Abstract:Adaptations of Transformer models, such as BERT4Rec and SASRec, achieve
state-of-the-art performance in the sequential recommendation task according to
accuracy-based metrics, such as NDCG. These models treat items as tokens and
then utilise a score-and-rank approach (Top-K strategy), where the model first
computes item scores and then ranks them according to this score. While this
approach works well for accuracy-based metrics, it is hard to use it for
optimising more complex beyond-accuracy metrics such as diversity. Recently,
the GPTRec model, which uses a different Next-K strategy, has been proposed as
an alternative to the Top-K models. In contrast with traditional Top-K
recommendations, Next-K generates recommendations item-by-item and, therefore,
can account for complex item-to-item interdependencies important for the
beyond-accuracy measures. However, the original GPTRec paper focused only on
accuracy in experiments and needed to address how to optimise the model for
complex beyond-accuracy metrics. Indeed, training GPTRec for beyond-accuracy
goals is challenging because the interaction training data available for
training recommender systems typically needs to be aligned with beyond-accuracy
recommendation goals. To solve the misalignment problem, we train GPTRec using
a 2-stage approach: in the first stage, we use a teacher-student approach to
train GPTRec, mimicking the behaviour of traditional Top-K models; in the
second stage, we use Reinforcement Learning to align the model for
beyond-accuracy goals. In particular, we experiment with increasing
recommendation diversity and reducing popularity bias. Our experiments on two
datasets show that in 3 out of 4 cases, GPTRec's Next-K generation approach
offers a better tradeoff between accuracy and secondary metrics than classic
greedy re-ranking techniques.
Machine Learning,Information Retrieval