Abstract:Adaptations of Transformer models, such as BERT4Rec and SASRec, achieve state-of-the-art performance in the sequential recommendation task according to accuracy-based metrics, such as NDCG. These models treat items as tokens and then utilise a score-and-rank approach (Top-K strategy), where the model first computes item scores and then ranks them according to this score. While this approach works well for accuracy-based metrics, it is hard to use it for optimising more complex beyond-accuracy metrics such as diversity. Recently, the GPTRec model, which uses a different Next-K strategy, has been proposed as an alternative to the Top-K models. In contrast with traditional Top-K recommendations, Next-K generates recommendations item-by-item and, therefore, can account for complex item-to-item interdependencies important for the beyond-accuracy measures. However, the original GPTRec paper focused only on accuracy in experiments and needed to address how to optimise the model for complex beyond-accuracy metrics. Indeed, training GPTRec for beyond-accuracy goals is challenging because the interaction training data available for training recommender systems typically needs to be aligned with beyond-accuracy recommendation goals. To solve the misalignment problem, we train GPTRec using a 2-stage approach: in the first stage, we use a teacher-student approach to train GPTRec, mimicking the behaviour of traditional Top-K models; in the second stage, we use Reinforcement Learning to align the model for beyond-accuracy goals. In particular, we experiment with increasing recommendation diversity and reducing popularity bias. Our experiments on two datasets show that in 3 out of 4 cases, GPTRec's Next-K generation approach offers a better tradeoff between accuracy and secondary metrics than classic greedy re-ranking techniques.

Improving Next Tokens via Second-Last Predictions with Generate and Refine

Better & Faster Large Language Models via Multi-token Prediction

PredToken: Predicting Unknown Tokens and Beyond with Coarse-to-Fine Iterative Decoding

Loop Neural Networks for Parameter Sharing

RecycleGPT: An Autoregressive Language Model with Recyclable Module

Future Token Prediction -- Causal Language Modelling with Per-Token Semantic State Vector for Multi-Token Prediction

Incorporating BERT into Parallel Sequence Decoding with Adapters.

The pitfalls of next-token prediction

Aligning GPTRec with Beyond-Accuracy Goals with Reinforcement Learning

A Critical Look At Tokenwise Reward-Guided Text Generation

Auto-Regressive Next-Token Predictors are Universal Learners

Output Layer Go First: Better Fine-tuning by Bridging the Gap with Pre-training

Autoregressive Modeling with Lookahead Attention

σ-GPTs: A New Approach to Autoregressive Models

Optimizing small BERTs trained for German NER

RobBERT-2022: Updating a Dutch Language Model to Account for Evolving Language Use

Language models are better than humans at next-token prediction

Generative Sequential Recommendation with GPTRec

Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition

Incremental Processing in the Age of Non-Incremental Encoders: An Empirical Assessment of Bidirectional Models for Incremental NLU

SGPT: GPT Sentence Embeddings for Semantic Search