Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations

Ziqiao Ma,Zekun Wang,Joyce Chai
2024-05-23
Abstract:Humans are efficient language learners and inherently social creatures. Our language development is largely shaped by our social interactions, for example, the demonstration and feedback from caregivers. Contrary to human language learning, recent advancements in large language models have primarily adopted a non-interactive training paradigm, and refined pre-trained models through feedback afterward. In this work, we aim to examine how corrective feedback from interactions influences neural language acquisition from the ground up through systematically controlled experiments, assessing whether it contributes to learning efficiency in language models. We introduce a trial-and-demonstration (TnD) learning framework that incorporates three components: student trials, teacher demonstrations, and a reward conditioned on language competence at various developmental stages. Our experiments reveal that the TnD approach accelerates word acquisition for student models of equal and smaller numbers of parameters, and we highlight the significance of both trials and demonstrations. We further show that the teacher's choices of words influence students' word-specific learning efficiency, and a practice-makes-perfect effect is evident by a strong correlation between the frequency of words in trials and their respective learning curves. Our findings suggest that interactive language learning, with teacher demonstrations and student trials, can facilitate efficient word learning in language models.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
This paper discusses how to improve the efficiency of language models through interactive learning. Inspired by human sociolinguistic learning methods, particularly focusing on the role of corrective feedback in neural language acquisition, the research proposes a learning framework called "Trial and Demonstration" (TnD), which consists of student trials, teacher demonstrations, and rewards based on the model's developmental trajectory. In the TnD framework, the student model first attempts to generate text, and then the teacher model provides corrected versions as demonstrations. A reward function evaluates these outputs based on the alignment between the student's language proficiency and developmental stage. The experiments demonstrate that the TnD approach can accelerate vocabulary acquisition and highlight the importance of trials and demonstrations. The vocabulary chosen by the teacher influences the learning efficiency of specific words, while the frequency of vocabulary in trials is highly correlated with its learning curve, proving that learning through language generation can expedite vocabulary proficiency. Furthermore, the paper points out that interactive learning combined with corrective feedback is more effective in promoting early learning of language models compared to traditional non-interactive training. However, as training progresses, the student model eventually converges to a performance level similar to the teacher model. In conclusion, this paper aims to explore how to improve the vocabulary learning efficiency of language models by simulating human interactive learning methods and proposes an effective interactive learning framework.