Bilingual Word Embedding with Sentence Combination CNN for 1-to-n Sentence Alignment.
Xinyuan Ren,Xiangling Fu,Xuesi Zhou,Chunsheng Liu,Songfeng Gao,Lei Peng
DOI: https://doi.org/10.1145/3443279.3443287
2020-01-01
Abstract:Sentence alignment, as one of the most active and fundamental tasks in the field of natural language processing (NLP), is usually realized in two categories of methods. One is traditional methods which are firstly proposed, the other, which are adopted later, is based on the Neural Network method. Presently, under the limitation that the existing mainstream data corpora are mostly in the form of 1-to-1, the alignment models with relatively good performance mainly apply to the cases of 1-to-1 sentence alignment. However, under the circumstance that a sentence contains too much information, 1-to-N sentence alignment can actually have a better effect on sentence translation tasks, compared with the 1-to-1 form, since it is more flexible and can reduce the complexity of the original sentence. As a result, we attempt to exploit neural networks with relatively good performance in the cases of 1-to-1 to fit in the cases of 1-to-N. In this paper, a novel 1-N Bilingual word Embedding with Sentence Combination CNN Improved Framework (1-NBESCC) is proposed in order to align 1-to-N sentences more precisely. Experiments show that our proposed model performs as good as the traditional methods such as BLEUALIGN in 1-to-1 situation, but much better in 1-to-N situation.