Grammar induction by MDL-based distributional classification

Yikun Guo,Fuliang Weng,Lide Wu
DOI: https://doi.org/10.1007/1-4020-2295-6_14
2004-01-01
Abstract:This chapter describes our grammar induction work using the Minimum Description Length (MDL) principle. We start with a diagnostic comparison between a basic best-first MDL induction algorithm and a pseudo induction process, which reveals problems associated with the existing MDL-based grammar induction approach. Based on this, we present a novel two-stage grammar induction algorithm which overcomes a local-minimum problem in the basic algorithm by clustering the left hand sides of the induced grammar rules with a seed grammar. Preliminary experimental results show that the resulting induction curve significantly outperforms traditional MDL-based grammar induction, and in a diagnostic comparison is very close to the ideal case. In addition, the new algorithm induces grammar rules with high precision. Finally, we discuss our future research directions to improve both the recall and precision of the algorithm.
What problem does this paper attempt to address?