Abstract:In recent years, due to the explosive growth of patent applications, patent mining has drawn extensive attention and interest. An important issue of patent mining is that of recognizing the technologies contained in patents, which serves as a fundamental preparation for deeper analysis. To this end, in this paper, we make a focused study on constructing a technology portrait for each patent, i.e., to recognize technical phrases concerned in it, which can summarize and represent patents from a technical perspective. Along this line, a critical challenge is how to analyze the unique characteristics of technical phrases and illustrate them with definite descriptions. Therefore, we first generate the detailed descriptions about the technical phrases existing in extensive patents based on different criteria, including various previous works, practical experience and statistical analyses. Then, considering the unique characteristics of technical phrases and the complex structure of patent documents, such as multi-aspect semantics and multi-level relevances, we further propose a novel unsupervised model, namely TechPat, which can not only automatically recognize technical phrases from massive patents but also avoid the need for expensive human labeling. After that, we evaluate the extraction results from various aspects. Specifically, we propose a novel evaluation metric called Information Retrieval Efficiency (IRE) to quantify the performance of extracted technical phrases from a new perspective. Extensive experiments on real-world patent data demonstrate that the TechPat model can effectively discriminate technical phrases in patents and greatly outperform existing methods. We further apply extracted technical phrases to two practical application tasks, namely patent search and patent classification, where the experimental results confirm the wide application prospects of technical phrases. Finally, we discuss the generalization ability of our proposed methods.

Extraction Approach of Patent Information Based on Regular Expression

EXTRACTING INFORMATION FROM CHINESE PRESCRIPTION PHARMACEUTICALS BASED ON NPOS SHORTEST-PATH WORD SEGMENTATION ALGORITHM

An Ontology-Based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design

A Semantic Query Expansion-Based Patent Retrieval Approach

Exploiting Semantic Knowledge Base for Patent Retrieval

Decoding Patent Information Using Patent Maps.

The patent mining analysis method based on Chinese word segmentation

Technical Phrase Extraction for Patent Mining: A Multi-level Approach

A patent retrieval method based on automatic query expansion

TechPat: Technical Phrase Extraction for Patent Mining

Automatic Abstraction of Long Chinese Patent Texts Based on P-Bertsum Model

Enterprise Collaborative Platform for Patents Analysis

A Patent Keyword Extraction Method Based on Corpus Classification

Experimental Study of Patent Information Content Mining

An Effective Method and Its Implementation for Automatic Extraction of Part Information from Engineering Drawings

Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification

Information and Strategies Supplied Patent Management Platform Based on Network

Ontology-based Patent Retrieval Technologies

Various Legal Factors Extraction Based on Machine Reading Comprehension.

An Intellectual Property Entity Recognition Method Based on Transformer and Technological Word Information

Chinese technical terminology extraction based on DC-value and information entropy