SLABERT Talk Pretty One Day: Modeling Second Language Acquisition with BERT

Aditya Yadavalli,Alekhya Yadavalli,Vera Tobin
2023-05-31
Abstract:Second language acquisition (SLA) research has extensively studied cross-linguistic transfer, the influence of linguistic structure of a speaker's native language [L1] on the successful acquisition of a foreign language [L2]. Effects of such transfer can be positive (facilitating acquisition) or negative (impeding acquisition). We find that NLP literature has not given enough attention to the phenomenon of negative transfer. To understand patterns of both positive and negative transfer between L1 and L2, we model sequential second language acquisition in LMs. Further, we build a Mutlilingual Age Ordered CHILDES (MAO-CHILDES) -- a dataset consisting of 5 typologically diverse languages, i.e., German, French, Polish, Indonesian, and Japanese -- to understand the degree to which native Child-Directed Speech (CDS) [L1] can help or conflict with English language acquisition [L2]. To examine the impact of native CDS, we use the TILT-based cross lingual transfer learning approach established by Papadimitriou and Jurafsky (2020) and find that, as in human SLA, language family distance predicts more negative transfer. Additionally, we find that conversational speech data shows greater facilitation for language acquisition than scripted speech data. Our findings call for further research using our novel Transformer-based SLA models and we would like to encourage it by releasing our code, data, and models.
Computation and Language
What problem does this paper attempt to address?
The paper primarily explores the phenomenon of cross-linguistic transfer in Second Language Acquisition (SLA) and simulates this process by constructing specific machine learning models. The research focuses on understanding and simulating the positive and negative impacts of the native language (L1) on the learning of the target foreign language (L2), known as positive transfer and negative transfer, respectively. The paper attempts to address the following issues: 1. **Understanding the impact of cross-linguistic transfer**: The study focuses on how the native language influences the learning of a foreign language, particularly in terms of vocabulary, pronunciation, and grammar. This includes phenomena of positive transfer (facilitating learning) and negative transfer (hindering learning). 2. **Developing a new model framework**: To better understand the above issues, the authors developed a new framework called SLABERT, which is based on the pre-trained language model BERT, to simulate the language transfer process in second language acquisition. 3. **Creating a Multilingual Child-Directed Speech Dataset (MAO-CHILDES)**: To study how child-directed speech (CDS) affects the learning of English as a second language, the researchers created a dataset containing five typologically different languages. 4. **Exploring the impact of language family distance on transfer effects**: The study hypothesizes that the greater the structural differences between languages, the more pronounced the negative transfer effects. This hypothesis is tested through experiments. 5. **Comparing the effects of conversational and scripted adult-directed speech**: The paper also compares the effects of conversational adult-directed speech (ADS) and scripted ADS in language learning to determine which type of data is more beneficial for language acquisition. In summary, this research aims to empirically investigate the mechanisms of cross-linguistic transfer in the process of second language acquisition, particularly the transfer effects from the native language to the target language, and experimentally verifies the impact of language structural similarity and the nature of conversation on learning outcomes.