A corpus-based approach to English subordinate clause identification

Jianmin Zhang,Tiejun Zhao,Sheng Li,Jianmin Yao
DOI: https://doi.org/10.3772/j.issn.1006-6748.2001.01.003
2001-01-01
Abstract:The complex sentence structure of English is a bottleneck to our practical machine translation system. The simplification of English subordinate clauses will greatly relieves the burden of parsing and other grammatical or semantic analysis of a complex sentence, thus improves the output quality of the MT system. But there have not any satisfactory research achievements reported in this field up to now as we know. In this paper, author's work on a corpus-based approach to English subordinate clause identification is reported. The approach integrates rule-based and statistical methods to get the left and right boundaries of the subordinate clauses. The Penn Treebank corpus is used as the training standard. The precision and recall ratios of subordinate clause identification are tested on both closed and open corpora. A result of 92.9 % precision and 91.26 % recall is obtained for the closed test and the open test result is 80.34 % precision and 83.93 % recall. This algorithm has been integrated into our ma chine translation system. The method can also be applied to processing of any other language.
What problem does this paper attempt to address?