A constrained hierarchical rule extraction method based on phrase collocations and high-frequency backbone words

Jinsong Su,Yajuan Lv,Qun Liu
2009-01-01
Abstract:Hierarchical-phrase based machine translation model is a popular translation model which combines advantages of phrase-based translation models and syntax-based translation models. However, since there are no linguistic constraints in the procedure of current hierarchical phrase extraction, there are a large number of redundant generalized rules extracted. In this paper, we propose two strategies to limit the extraction of hierarchical rules and eliminate the number of redundant rules: first, we identify the phrase collocations with the log likelihood ratio, and then we require the phrase collocations should be packed as a whole during the extraction; second, we distinguish the backbone words using the frequency, and then set the limit during extraction that the sub phrases which consist of only backbone words can not be replaced with variables. Experimental results show that our methods substantially reduce the number of generalized rules and have no significant decrease in BLEU score.
What problem does this paper attempt to address?