Rule-Based Machine Translation from Tunisian Dialect to Modern Standard Arabic
Mohamed Ali Sghaier,Mounir Zrigui
DOI: https://doi.org/10.1016/j.procs.2020.08.033
2020-01-01
Procedia Computer Science
Abstract:This paper aims to present a machine translation system capable of translating Tunisian Dialect (TD) text to Modern Standard Arabic (MSA) using a rule-based approach. Having adopted such a classical approach can give us better translation mainly in terms of morphology and syntax. Moreover, it also allows us to automate the translation task to build a training dataset (parallel corpus), which can be exploited whether for hybridization with the statistical approach or even for the newly developed neural-network method. To do that, our work is based on a free open source platform, named Apertium, which provides a whole environment that includes all required tools to develop an RBMT system. Therefore, the translation process starts by given TD input text, where the data are passed through three big parts, which are respectively morphological analysis and disambiguation, lexical and structural transfer, and morphological generation and spelling corrections to get the translated text in MSA. In fact, each of these stages is a challenge. It should be noted that we have made a great effort to create the required resources from scratch starting by the two monolingual morphological dictionaries for TD and MSA, the bilingual lexical dictionary to map TD words to their equivalents in MSA, and finally setting up more than 500 rules for disambiguation and structural transfer. The proposed system is evaluated by three metrics: WER, TER and BLEU score. As a result, we respectively reach 23.28%, 23.85% and 55.22, which are promising scores.