Abstract:English has become one of the most widely used languages in the world. If there is no good translation mechanism for such a widely used language, it will bring trouble to both study and life. At present, the world’s major platforms are committed to the study of English translation strategies. There are translation platforms from different regions and different translation mechanisms. These translation data from different translation platforms have the characteristics of large-scale, multisource, heterogeneity, high dimensions, and poor quality. However, such inconsistent translation data will increase the translation difficulty and translation time. Therefore, it is necessary to improve the quality of translation data to achieve a better translation effect. How to provide a large-scale and efficient translation strategy needs to integrate the translation strategies of various platforms to perform heterogeneous translation data cleaning and fusion based on machine learning. At first, this paper represents the multisource, heterogeneous translation data model as tree-augmented naive Bayes networks (TANs) and naturally captures the relationship between the datasets through the learning of TANs structure and the probability distribution of input attributes and tuples, using data probability value to complete the classification of translation data cleaning. Then, a multisource, heterogeneous translation data fusion model based on recurrent neural network (RNN) is constructed, and RNN is used to control the node data of hidden layer to enhance the fault-tolerant ability in the fusion process and complete the construction of fusion model. Finally, experimental results show that TANs-based translation data cleaning method can effectively improve the cleaning rate with an average improvement of approximately 10% and cleaning time with an average reduce about 5%. In addition, RNN-based multisource translation data fusion method improves the shortcomings of the traditional fusion model and improves the practicability of the fusion model in terms of root mean square error (RMSE), mean absolute percentage error (MAPE), fusion time, and integrity.

Towards More Diverse Input Representation for Neural Machine Translation

Mutual Information and Diverse Decoding Improve Neural Machine Translation.

Generating Diverse Translation by Manipulating Multi-Head Attention

Exploring Multi-Stage Information Interactions for Multi-Source Neural Machine Translation

Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input

Multi-channel Encoder for Neural Machine Translation

Improving Neural Machine Translation Model with Deep Encoding Information

Improving Neural Machine Translation with Pre-trained Representation

Neural Machine Translation with Joint Representation

Interactive Attention for Neural Machine Translation

Learning to Refine Source Representations for Neural Machine Translation

AI-Based Heterogenous Large-Scale English Translation Strategy

Data Diversification: A Simple Strategy For Neural Machine Translation

Explicit Reordering for Neural Machine Translation

Modeling Past and Future for Neural Machine Translation

Attention-via-Attention Neural Machine Translation

A neural machine translation method based on split graph convolutional self-attention encoding

Effective Approaches to Attention-based Neural Machine Translation

Dense Information Flow for Neural Machine Translation.

Training Deeper Neural Machine Translation Models with Transparent Attention

Neural Machine Translation with Supervised Attention