The Jinan Chinese Learner Corpus.

Maolin Wang,Shervin Malmasi,Mingxuan Huang
DOI: https://doi.org/10.3115/v1/w15-0614
2015-01-01
Abstract:We present the Jinan Chinese Learner Corpus, a large collection of L2 Chinese texts produced by learners that can be used for educational tasks. The present work introduces the data and provides a detailed description. Currently, the corpus contains approximately 6 million Chinese characters written by students from over 50 different L1 backgrounds. This is a large-scale corpus of learner Chinese texts which is freely available to researchers either through a web interface or as a set of raw texts. The data can be used in NLP tasks including automatic essay grading, language transfer analysis and error detection and correction. It can also be used in applied and corpus linguistics to support Second Language Acquisition (SLA) research and the development of pedagogical resources. Practical applications of the data and future directions are discussed.
What problem does this paper attempt to address?