Design and Implementation of the Patent Topical Web Crawler System

Yunqing Hu,Qingying Qiu,Xiu Yu
DOI: https://doi.org/10.1145/3239283.3239326
2018-01-01
Abstract:In order to provide a knowledge source for the innovative design of the patent-based computer-aided products, a patent topical web crawler was designed and developed targeting at the patent information of the US Patent and Trademark Office (USPTO). In this paper, we describe the overall design and workflow of the patent topical crawler, including the basic functional architecture and key system technologies; propose the patent short text similarity calculation method based on Doc2Vec for the relevance discrimination of patent topic, which can effectively screen the required patent data. The experiment result shows that, this patent topical web crawler has high acquisition efficiency and applicability.
What problem does this paper attempt to address?