Abstract:The technology for converting Chinese to Braille is of great importance. When paired with a Braille display, it can better meet the educational and daily needs of the visually impaired community, especially children and students. Incorporating visual assistance mechanisms can further enhance the user experience and provide comprehensive support for individuals with visual impairments. In recent years, the use of end-to-end neural machine translation models for Chinese–Braille translation has gained traction. However, this task requires large, high-quality, and domain-specific parallel data to train robust models. Unfortunately, the existing Chinese–Braille parallel data is insufficient to achieve satisfactory results. To address this challenge, this paper puts forward a groundbreaking approach that integrates pre-training models into the Chinese Braille translation task. This represents the first-ever application of such technology in this context and it is different from traditional pre-training methods. While previous pre-training method of natural language processing mainly utilized raw text data, we have identified its limitations in improving Chinese–Braille translation. Therefore, we have proposed three novel forms of pre-training datasets, instead of relying solely on raw text data. By utilizing the Transformer model, our approach achieves the highest BLEU score of 94.53 on a 10k parallel corpus, presenting a new direction for Chinese–Braille translation research. Furthermore, we introduce a new form of data that enables Chinese–Braille translation solely using the encoder framework. Leveraging the MacBERT model, this approach achieves a BLEU score of 98.87 on the test set and demonstrates an inference speed 54 times faster than the Transformer model. These findings have significant implications for the field of Chinese–Braille translation, providing insights for future research endeavors.

Multilingual Denoising Pre-training for Neural Machine Translation

Multilingual Translation with Extensible Multilingual Pretraining and Finetuning

Generalization algorithm of multimodal pre-training model based on graph-text self-supervised training

Continual Mixed-Language Pre-Training for Extremely Low-Resource Neural Machine Translation

Pre-Trained Multilingual Sequence-to-Sequence Models: A Hope for Low-Resource Language Translation?

PARADISE: Exploiting Parallel Data for Multilingual Sequence-to-Sequence Pretraining

Linguistically-driven Multi-task Pre-training for Low-resource Neural Machine Translation

Multimodal Pretraining from Monolingual to Multilingual

Large-scale Pretraining for Neural Machine Translation with Tens of Billions of Sentence Pairs

Rethinking Denoised Auto-Encoding in Language Pre-Training.

Bilingual Dictionary-based Language Model Pretraining for Neural Machine Translation

Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation and Understanding

Enhancing Low-Resource NMT with a Multilingual Encoder and Knowledge Distillation: A Case Study

Denoising Pre-training for Machine Translation Quality Estimation with Curriculum Learning.

Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information

Improving Neural Machine Translation by Bidirectional Training

DEEP: DEnoising Entity Pre-training for Neural Machine Translation

Improving Language Transfer Capability of Decoder-only Architecture in Multilingual Neural Machine Translation

Pre-training model for low-resource Chinese–Braille translation

A Study for Enhancing Low-resource Thai-Myanmar-English Neural Machine Translation

XLNet: Generalized Autoregressive Pretraining for Language Understanding