Information Seeking in the Spirit of Learning: a Dataset for Conversational Curiosity

Pedro Rodriguez,Paul Crook,Seungwhan Moon,Zhiguang Wang
DOI: https://doi.org/10.18653/v1/2020.emnlp-main.655
2020-11-10
Abstract:Open-ended human learning and information-seeking are increasingly mediated by digital assistants. However, such systems often ignore the user's pre-existing knowledge. Assuming a correlation between engagement and user responses such as "liking" messages or asking followup questions, we design a Wizard-of-Oz dialog task that tests the hypothesis that engagement increases when users are presented with facts related to what they know. Through crowd-sourcing of this experiment, we collect and release 14K dialogs (181K utterances) where users and assistants converse about geographic topics like geopolitical entities and locations. This dataset is annotated with pre-existing user knowledge, message-level dialog acts, grounding to Wikipedia, and user reactions to messages. Responses using a user's prior knowledge increase engagement. We incorporate this knowledge into a multi-task model that reproduces human assistant policies and improves over a BERT content model by 13 mean reciprocal rank points.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to improve users' information - seeking and learning efficiency through the conversations between digital assistants and users, especially when these conversations can be combined with users' existing knowledge. Specifically, the researchers designed a dataset named Curiosity, aiming to test the hypothesis that when the information provided by digital assistants is related to users' known knowledge, users' engagement will be higher. To verify this hypothesis, the researchers collected 14,048 conversations (including 181,068 messages) through crowdsourcing experiments. These conversations revolved around geographical topics, such as geopolitical entities and locations. Each conversation was labeled with users' prior knowledge, message - level conversational behaviors, links to Wikipedia, and users' reactions to messages. The study found that using facts related to users' existing knowledge can indeed increase users' engagement. Based on this dataset, the researchers also developed a multi - task model, which can imitate the behavioral strategies of human assistants and improve the mean reciprocal rank by 13 points on some tasks compared to the BERT model.