A Breadth-First Representation for Tree Matching in Large Scale Forest-Based Translation.

Sumukh Ghodke,Steven Bird,Rui Zhang
2011-01-01
Abstract:Efficient data structures are necessary for searching large translation rule dictionaries in forest-based machine translation. We propose a breadth-first representation of tree structures that allows trees to be stored and accessed efficiently. We describe an algorithm that allows incremental search for trees in a forest and show that its performance is orders of magnitude faster than iterative search. A B-tree index is used to store the rule dictionaries. Prefix-compressed indexes with a large page size are found to provide a balance of fast search and disk space utilisation.
What problem does this paper attempt to address?