A New Method for Rare Feature Extraction in Patent Documents

Mengzhuo Guo,Hua Yuan,Yu Qian
DOI: https://doi.org/10.1109/icsssm.2016.7538569
2016-01-01
Abstract:Patent documents, as large corpora, potentially contains great knowledge. In order to gain more valid information from the massive unstructured data efficiently, a new research area, called patent mining, emerges in recent years, aiming to analyze the patent documents. The popular methods were generally based on the term frequency, which results that a proximate predefined threshold acts an important role in mining process to obtain the valid information. However, such a mining process leaves out rare features lower than threshold. In this paper, we proposed a new method (weighted max confidence) to detect rare features in massive patent documents. The experimental results show that the method can optimize the rare features ming results with a higher precision.
What problem does this paper attempt to address?