Abstract:Conversational recommender systems (CRSs) have been proposed recently to mitigate the cold-start problem suffered by the traditional recommender systems. By introducing conversational key-terms, existing conversational recommenders can effectively reduce the need for extensive exploration and elicit the user preferences faster and more accurately. However, existing conversational recommenders leveraging key-terms heavily rely on the availability and quality of the key-terms, and their performances might degrade significantly when the key-terms are incomplete or not well labeled, which usually happens when there are new items being consistently incorporated into the systems and involving lots of human efforts to acquire well-labeled key-terms is costly. Besides, existing CRS methods leverage the feedback to different conversational key-terms separately, without considering the underlying relations between the key-terms. In this case, the learning of the conversational recommenders is sample inefficient, especially when there is a large number of candidate conversational key-terms. In this paper, we propose a knowledge-aware conversational preference elicitation framework and a bandit-based algorithm GraphConUCB. To achieve efficient preference elicitation given items with incompletely labeled key-terms, our algorithm leverage the underlying relations between the key-terms, guided by the knowledge graph. Being knowledge-aware, our algorithm propagates the user preferences via a pseudo graph feedback module, which also accelerates the exploration in the large action space of key-terms and improves the conversational sample efficiency. To select the most informative conversational key-terms in the graphs to conduct conversations, we further devise a graph-based optimal design module which leverages the graph structure. We provide the theoretical analysis of the regret upper bound for GraphConUCB. With extensive experiments, we show that our algorithm can effectively handle the items with incompletely labeled key-terms, and improves over the state-of-the-art baselines significantly.

Hierarchical Conversational Preference Elicitation with Bandit Feedback

Efficient Explorative Key-term Selection Strategies for Conversational Contextual Bandits

Knowledge-aware Conversational Preference Elicitation with Bandit Feedback

Conversational Contextual Bandit: Algorithm and Application

Clustering of Conversational Bandits for User Preference Learning and Elicitation

Robust and Efficient Algorithms for Conversational Contextual Bandit

Toward Joint Utilization of Absolute and Relative Bandit Feedback for Conversational Recommendation

Clustering of Conversational Bandits with Posterior Sampling for User Preference Learning and Elicitation

Show Me the Whole World: Towards Entire Item Space Exploration for Interactive Personalized Recommendations

Comparison-based Conversational Recommender System with Relative Bandit Feedback

Show Me the Whole World

Evaluating Online Bandit Exploration In Large-Scale Recommender System

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Conversational Dueling Bandits in Generalized Linear Models

The Nah Bandit: Modeling User Non-compliance in Recommendation Systems

Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications

Contextual Combinatorial Cascading Bandits.

BiUCB: A Contextual Bandit Algorithm for Cold-Start and Diversified Recommendation

Seamlessly Unifying Attributes and Items: Conversational Recommendation for Cold-start Users

Cascading Bandits: Optimizing Recommendation Frequency in Delayed Feedback Environments.

Counterfactual contextual bandit for recommendation under delayed feedback