A Model for Linguistic Knowledge Discovery from Large-Scale Corpuses Based on Rough Set Techniques

陈清才,王晓龙,赵健
DOI: https://doi.org/10.3969/j.issn.1007-130x.2004.05.017
2004-01-01
Abstract:In the paper, a linguistic feature table (LFT) is first provided to structurize textural information and to represent long-distance constraints. Then, the redundant information in the LFT is wiped off by a kind of object-oriented data generalization algorithm, inconsistent objects are filtered through the rule extraction algorithm and a consistent and efficient rule base is constructed for the NLP application. At last, the applications in Chinese word sense disambiguation and Chinese pinyin-to-character conversion are presented. In the case of introducing a dynamic rule smoothing algorithm, our experiment achieves 0.93 and 0.95 of decision precisions and 0.92 and 0.89 of rule recall rates with respect to these two applications, which shows the good performance of the model.
What problem does this paper attempt to address?