Abstract:The performance of medical research can be viewed and evaluated not only from the perspective of publication output, but also from the perspective of economic exploitability. Patents can represent the exploitation of research results and thus the transfer of knowledge from research to industry. In this study, we set out to identify publication-patent pairs in order to use patents as a proxy for the economic impact of research. To identify these pairs, we matched scholarly publications and patents by comparing the names of authors and investors. To resolve the ambiguities that arise in this name-matching process, we expanded our approach with two additional filter features, one used to assess the similarity of text content, the other to identify common references in the two document types. To evaluate text similarity, we extracted and transformed technical terms from a medical ontology (MeSH) into numerical vectors using word embeddings. We then calculated the results of the two supporting features over an example five-year period. Furthermore, we developed a statistical procedure which can be used to determine valid patent classes for the domain of medicine. Our complete data processing pipeline is freely available, from the raw data of the two document types right through to the validated publication-patent pairs.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to more accurately identify the association between academic publications and patents in order to assess the degree of transformation of research results into economic activities. Specifically, the author hopes to use patents as a proxy indicator for measuring the economic impact of research by matching academic publications and patents, thereby reducing the ambiguity generated during the name - matching process. ### Main problems: 1. **Assessing the economic impact of research**: Traditional research evaluation mainly depends on the number of publications and the acquisition of external funds, but these methods cannot fully reflect the economic value of research results in practical applications. 2. **Reducing the ambiguity of name - matching**: When matching academic publications with patents, since the names of authors and inventors may be homonyms, the matching results are inaccurate. Therefore, additional filtering features need to be introduced to improve the accuracy of matching. ### Solutions: To solve the above problems, the author proposes the following improvement measures: - **Text content similarity comparison**: Convert medical terms (such as terms in the MeSH vocabulary) into numerical vectors and calculate text similarity using word embeddings technology. Specifically, use the BERT model to generate the vector representation of each document and evaluate the content similarity of two documents by cosine similarity. \[ \text{Cosine Similarity}=\frac{\mathbf{A}\cdot\mathbf{B}}{\|\mathbf{A}\|\|\mathbf{B}\|} \] - **Co - cited literature analysis**: Analyze the co - cited literature in patents and publications to further confirm whether they involve the same research topic. This helps to reduce many - to - many relationships and improve the accuracy of matching. - **Patent category filtering**: Develop a statistical method to automatically select effective patent categories (IPC) to ensure that the matched patents and publications belong to the same field. Through Q - Q plot analysis of the patent category distributions of different subsets, the patent categories suitable for the medical field are determined. ### Summary: The core objective of this paper is to more accurately assess the ability of research results to transform into economic activities by improving the matching method between academic publications and patents. By introducing innovative means such as text similarity, co - cited literature analysis, and patent category filtering, the author has successfully reduced the ambiguity in the matching process and provided a reproducible data processing flow.

Patent-publication pairs for the detection of knowledge transfer from research to industry: reducing ambiguities with word embeddings and references

Exploiting Semantic Knowledge Base for Patent Retrieval

A Text-Embedding-based Approach to Measure Patent-to-Patent Technological Similarity -- Workflow, Code, and Applications

PatentSBERTa: A Deep NLP based Hybrid Model for Patent Distance and Classification using Augmented SBERT

Enhancing patent retrieval using text and knowledge graph embeddings: a technical note

[Hemiballism treated with valproic acid: report of 2 cases].

Methods of information retrieval using Web of Science: pulmonary hypertension as a subject example

Text matching to measure patent similarity

Knowledge Transfer with Medical Language Embeddings

Trace on Both Sides: a Two-Step Text Mining Method to Identify Academic Inventors’ Patent–paper Pairs

Publication Activity, Citation Impact and Bi-Directional Links Between Publications and Patents in Biotechnology

Exploiting Ontologies to Rank Relationships Between Patents

Exploring sets of molecules from patents and relationships to other active compounds in chemical space networks

Assessment of the significance of patent-derived information for the early identification of compound–target interaction hypotheses

Retrouver l'inventeur-auteur : la lev{é}e d'homonymies d'autorat entre les brevets et les publications scientifiques

Measuring science and innovation linkage using text mining of research papers and patent information

Tracking the technological composition of industries with algorithmic patent concordances

Bibliometric Perspectives on Medical Innovation Using the Medical Subject Headings of PubMed

Development of an information retrieval tool for biomedical patents

AI-assisted Knowledge Discovery in Biomedical Literature to Support Decision-making in Precision Oncology

Bibliometric Perspectives on Medical Innovation using the Medical Subject Headings (MeSH) of PubMed