Abstract:We study the patent phrase similarity inference task, which measures the semantic similarity between two patent phrases. As patent documents employ legal and highly technical language, existing semantic textual similarity methods that use localized contextual information do not perform satisfactorily in inferring patent phrase similarity. To address this, we introduce a graph-augmented approach to amplify the global contextual information of the patent phrases. For each patent phrase, we construct a phrase graph that links to its focal patents and a list of patents that are either cited by or cite these focal patents. The augmented phrase embedding is then derived from combining its localized contextual embedding with its global embedding within the phrase graph. We further propose a self-supervised learning objective that capitalizes on the retrieved topology to refine both the contextualized embedding and the graph parameters in an end-to-end manner. Experimental results from a unique patent phrase similarity dataset demonstrate that our approach significantly enhances the representation of patent phrases, resulting in marked improvements in similarity inference in a self-supervised fashion. Substantial improvements are also observed in the supervised setting, underscoring the potential benefits of leveraging retrieved phrase graph augmentation.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges in the patent phrase similarity inference task. Specifically, due to the use of legal and highly technical languages in patent documents, existing semantic text similarity methods based on local context information (such as Sentence - BERT or SimCSE) perform poorly in inferring patent phrase similarity. In addition, obtaining a large number of expert annotations for supervised training also faces great challenges, because it is not only costly but also requires in - depth knowledge in the patent field. To solve these problems, the authors propose a retrieval - augmented graph augmentation method, aiming to effectively capture the representations of patent phrases. By introducing global context information, this method significantly improves the effect of patent phrase similarity inference. The following are the main innovation points of this method: 1. **Constructing Phrase Graphs**: For each patent phrase, construct a phrase graph that includes its core patent and the patents that cite or are cited by it. This makes the representation of a phrase depend not only on local context information but also on global context information. 2. **Self - Supervised Learning Objective**: Utilize the topological structure of the phrase graph to design a self - supervised learning objective to optimize text context embeddings and graph parameters simultaneously in an end - to - end manner. This method effectively solves the problem of label scarcity. 3. **Experimental Verification**: Experiments were carried out on a unique patent phrase similarity dataset, and the results show that this method significantly enhances the representation of patent phrases and achieves significant improvements in both self - supervised and supervised settings. ### Formula and Symbol Explanation - \( V \) represents the set of patents, where the \( i \) - th patent is represented as \( v_i\in V \). - \( U \) represents the set of phrases, where the \( j \) - th phrase is represented as \( u_j\in U \). - \( E_c\in\mathbb{R}^{N\times N} \) is an adjacency matrix representing the citation relationships between patents, \( E_c(i, j) = 1 \) indicates that patent \( v_i \) cites patent \( v_j \), and 0 otherwise. - \( E_r\in\mathbb{R}^{N\times M} \) is an adjacency matrix representing the relationships between patents and phrases, \( E_r(i, j) = 1 \) indicates that phrase \( u_j \) appears in patent \( v_i \), and 0 otherwise. - \( G_u=(U_u, V_u, E_r^u, E_c^u) \) represents the ego - graph (local sub - graph) of phrase \( u \). - \( f(u)\in\mathbb{R}^d \) represents the text embedding of phrase \( u \). - \( g(G_u)\in\mathbb{R}^d \) represents the ego - graph embedding of phrase \( u \). - \( \phi(u)=f(u)\oplus g(G_u) \) represents the retrieval - augmented phrase embedding, where \( \oplus \) represents element - wise addition. ### Conclusion The method proposed in this paper significantly improves the effect of patent phrase similarity inference by introducing global context information and self - supervised learning. The experimental results show that this method outperforms existing methods in both self - supervised and supervised settings, demonstrating its potential application value in the field of patent analysis.

Connecting the Dots: Inferring Patent Phrase Similarity with Retrieved Phrase Graphs

An SDN Architecture for Patent Prior Art Search System Based on Phrase Embedding

A Method for Assessing Patent Similarity Using Direct and Indirect Citation Links

A Semantic Query Expansion-Based Patent Retrieval Approach

Exploiting Semantic Knowledge Base for Patent Retrieval

An Ontology-Based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design

Enhancing patent retrieval using text and knowledge graph embeddings: a technical note

Semantic Similarity Matching for Patent Documents Using Ensemble BERT-related Model and Novel Text Processing Method

A comparative analysis of embedding models for patent similarity

PatentSBERTa: A Deep NLP based Hybrid Model for Patent Distance and Classification using Augmented SBERT

Unsupervised technical phrase extraction by incorporating structure and position information

TechPat: Technical Phrase Extraction for Patent Mining

Technical Phrase Extraction for Patent Mining: A Multi-level Approach

Patents Phrase to Phrase Semantic Matching Dataset

A Novel Patent Similarity Measurement Methodology: Semantic Distance and Technological Distance

PatSTEG: Modeling Formation Dynamics of Patent Citation Networks via The Semantic-Topological Evolutionary Graph

A Text-Embedding-based Approach to Measure Patent-to-Patent Technological Similarity -- Workflow, Code, and Applications

Patent-KG: Patent Knowledge Graph Use for Engineering Design

Patent Mining by Extracting Functional Analysis Information Modelled As Graph Structure: A Patent Knowledge-base Collaborative Building Approach

A Survey on Sentence Embedding Models Performance for Patent Analysis

Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration