From Neural Machine Translation to Large Language Models: Analysing Translation Quality of Chinese Idioms
Sofiia Denysiuk,Yafei Zhu,Daisy Monika Lal,Ruslan Mitkov
DOI: https://doi.org/10.26615/issn.2815-4711.2024_021
Abstract:Idioms present a formidable challenge for machine translation (MT) due to their figurative, culture-specific, and linguistic complexity. In this study, we compiled a corpus of 100 Chinese idioms from the Dictionary of Chinese Idioms and conducted quantitative analyses of nine state-of-the-art MT systems. Recognising the linguistic complexity of idioms, we introduced AIE, a new evaluation metric for translations, derived from its three assessment criteria: Accuracy, Intelligibility, and Elegance. In this framework, we suggest assigning distinct weights to its metrics, supported by empirical evidence. Additionally, we employed automatic metrics ROUGE, BLEU, BLEURT, and METEOR, to assess translation quality. Our analysis revealed that while BLEURT and BLEU exhibited stronger correlations with human scores, the overall correlation remained weak. Furthermore, recognising the significance of automatic evaluation in natural language processing (NLP), we hypothesised that combining existing automatic metrics could yield improved assessment scores compared to individual metrics. To validate this hypothesis, we computed average scores of automatic metrics, which demonstrated a positive correlation with human scores, suggesting a promising alternative. Our findings indicate that GPT4 and GLM4 outperform other state-of-the-art models even in translating less commonly used idioms. Keywords: Chinese idioms · Machine translation · Large language model.
Linguistics,Computer Science