Natural language processing for similar languages, varieties, and dialects: A survey
Marcos Zampieri,Preslav Nakov,Yves Scherrer,Scherrer,Yves
DOI: https://doi.org/10.1017/S1351324920000492
IF: 1.841
2020-11-20
Natural Language Engineering
Abstract:There has been a lot of recent interest in the natural language processing (NLP) community in the computational processing of language varieties and dialects, with the aim to improve the performance of applications such as machine translation, speech recognition, and dialogue systems. Here, we attempt to survey this growing field of research, with focus on computational methods for processing similar languages, varieties, and dialects. In particular, we discuss the most important challenges when dealing with diatopic language variation, and we present some of the available datasets, the process of data collection, and the most common data collection strategies used to compile datasets for similar languages, varieties, and dialects. We further present a number of studies on computational methods developed and/or adapted for preprocessing, normalization, part-of-speech tagging, and parsing similar languages, language varieties, and dialects. Finally, we discuss relevant applications such as language and dialect identification and machine translation for closely related languages, language varieties, and dialects.
computer science, artificial intelligence,linguistics