For those who don't know (how) to ask: Building a dataset of technology questions for digital newcomers

Evan Lucas,Kelly S. Steelman,Leo C. Ureel,Charles Wallace
2024-03-27
Abstract:While the rise of large language models (LLMs) has created rich new opportunities to learn about digital technology, many on the margins of this technology struggle to gain and maintain competency due to lexical or conceptual barriers that prevent them from asking appropriate questions. Although there have been many efforts to understand factuality of LLM-created content and ability of LLMs to answer questions, it is not well understood how unclear or nonstandard language queries affect the model outputs. We propose the creation of a dataset that captures questions of digital newcomers and outsiders, utilizing data we have compiled from a decade's worth of one-on-one tutoring. In this paper we lay out our planned efforts and some potential uses of this dataset.
Computation and Language
What problem does this paper attempt to address?
This paper aims to address the vocabulary and conceptual barriers faced by digital novices when asking questions, which hinder their ability to acquire and maintain technical skills. The researchers plan to create a dataset consisting of novices' questions in order to understand and improve the output effectiveness of large-scale language models when handling ambiguous or non-standard language queries. In this way, they hope to develop an automated conversational teaching system that is comparable to personal tutoring, to help improve digital literacy.