A co-training based method for chinese patent semantic annotation.

Xu Chen,Zhiyong Peng,Cheng Zeng
DOI: https://doi.org/10.1145/2396761.2398645
2012-01-01
Abstract:Patents are public and scientific literatures protected by the law, and their abstracts highly contained valuable information. Patent's semantic annotation can effectively protect intellectual property rights and promote corporations' scientific research innovation. Currently, automatic patent annotation mainly used supervised machine learning algorithms, which required abundant expensive labeled patent data. Due to lack of enough labeled Chinese patent data, this paper adopted a semi-supervised machine learning method named co-training, which started from a little labeled data. This method combined keyword extraction with list extraction, and incrementally annotated functional clauses in patent abstract. Experiment results indicated this method can gradually improve the recall without sacrificing the precision.
What problem does this paper attempt to address?