Saudi Learner Translation Corpus: The design and compilation of an English-Arabic learner translation corpus

Maha Al-Harthi,Amal Alsaif,Eman Al-Nafjan,Fatma Alshihri,Mahmoud Saleh
DOI: https://doi.org/10.1371/journal.pone.0303729
IF: 3.7
2024-10-23
PLoS ONE
Abstract:This article introduces the Saudi Learner Translation Corpus (SauLTC), an innovative multi-version English-Arabic parallel corpus featuring part-of-speech tagging. We describe the corpus parameters and compilation process and explain how textual processing and sentence alignment are conducted. The participants include 366 student translators, 48 instructors, and 23 alignment verifiers. The corpus provides access to two target versions of every ST to allow the detection of the changes in the translation and revision processes from the initial to the final draft. The translations were collected over three years, yielding 5,160,386 tokens. The metadata of 23 sentence alignment verifiers were added to the analysis as a unique variable to investigate individual differences in the manual verification process. This unidirectional corpus can be used to identify student translators' strategies and errors in translation and analyze the efficacy of instructors' feedback. Furthermore, it is accessible via an application and a website. It provides translation teachers and researchers with a database that can help develop corpus-based and corpus-driven teaching materials.
What problem does this paper attempt to address?