Abstract:Large language models (LLMs) have achieved remarkable success across various domains, but effectively incorporating complex and potentially noisy user timeline data into LLMs remains a challenge. Current approaches often involve translating user timelines into text descriptions before feeding them to LLMs, which can be inefficient and may not fully capture the nuances of user behavior. Inspired by how LLMs are effectively integrated with images through direct embeddings, we propose User-LLM, a novel framework that leverages user embeddings to directly contextualize LLMs with user history interactions. These embeddings, generated by a user encoder pretrained using self-supervised learning on diverse user interactions, capture latent user behaviors and interests as well as their evolution over time. We integrate these user embeddings with LLMs through cross-attention, enabling LLMs to dynamically adapt their responses based on the context of a user's past actions and preferences. Our approach achieves significant efficiency gains by representing user timelines directly as embeddings, leading to substantial inference speedups of up to 78.1X. Comprehensive experiments on MovieLens, Amazon Review, and Google Local Review datasets demonstrate that User-LLM outperforms text-prompt-based contextualization on tasks requiring deep user understanding, with improvements of up to 16.33%, particularly excelling on long sequences that capture subtle shifts in user behavior. Furthermore, the incorporation of Perceiver layers streamlines the integration between user encoders and LLMs, yielding additional computational savings.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively integrate complex and potentially noisy user timeline data into large language models (LLMs). Existing methods usually involve converting the user timeline into a text description and then inputting it into LLMs. This method is not only inefficient but may also fail to fully capture the subtle behavioral characteristics of users. To solve this problem, the paper proposes a new framework - USER - LLM. This framework contextualizes LLMs by directly using user embeddings, thereby dynamically adjusting the responses of LLMs according to users' past behaviors and preferences. Specifically, USER - LLM utilizes a pre - trained user encoder. This encoder generates user embeddings from diverse user interactions through self - supervised learning. These embeddings can capture users' latent behaviors, interests, and their changes over time. Through the cross - attention mechanism, these user embeddings are integrated into LLMs, enabling LLMs to dynamically adjust their responses based on the context of users' past actions and preferences. The main contributions of the paper include: 1. Introducing the USER - LLM framework, which uses user embeddings to directly contextualize LLMs, enabling them to dynamically adapt to users' past behaviors and preferences. 2. Conducting extensive experiments on three public datasets, demonstrating that USER - LLM significantly outperforms text - prompt - based contextualization methods in tasks requiring in - depth understanding of users, especially in the case of long - context inputs. 3. Providing a comprehensive analysis of different user embedding generation architectures, encoder - LLM co - training strategies, and the impact of different ways of integrating embeddings with LLMs on LLM personalization, providing valuable insights for future research. 4. Thoroughly exploring the cross - attention mechanism for integrating user embeddings, exploring gated cross - attention to understand how user embeddings affect the behavior of LLMs, and introducing the Perceiver layer to further improve computational efficiency. In this way, USER - LLM not only improves the efficiency of processing user timeline data but also enhances the performance of LLMs in tasks such as personalized recommendation and question - answering.

User-LLM: Efficient LLM Contextualization with User Embeddings

LLMEmbed: Rethinking Lightweight LLM's Genuine Function in Text Classification

LLMs are Also Effective Embedding Models: An In-depth Overview

Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval

LLMs + Persona-Plug = Personalized LLMs

EmbedLLM: Learning Compact Representations of Large Language Models

E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

Unleashing the Power of LLMs as Multi-Modal Encoders for Text and Graph-Structured Data

Do LLMs Understand User Preferences? Evaluating LLMs On User Rating Prediction

Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion

InfMLLM: A Unified Framework for Visual-Language Tasks.

LLM-Enhanced User-Item Interactions: Leveraging Edge Information for Optimized Recommendations

Understanding User Experience in Large Language Model Interactions

EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model

ULLME: A Unified Framework for Large Language Model Embeddings with Generation-Augmented Learning

ModaVerse: Efficiently Transforming Modalities with LLMs

Enhancing Embedding Performance through Large Language Model-based Text Enrichment and Rewriting

LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation

Extensible Embedding: A Flexible Multipler For LLM's Context Length

ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance

Make Your LLM Fully Utilize the Context