$\texttt{PatentAgent}$: Intelligent Agent for Automated Pharmaceutical Patent Analysis

Xin Wang,Yifan Zhang,Xiaojing Zhang,Longhui Yu,Xinna Lin,Jindong Jiang,Bin Ma,Kaicheng Yu

2024-10-26

Abstract:Pharmaceutical patents play a vital role in biochemical industries, especially in drug discovery, providing researchers with unique early access to data, experimental results, and research insights. With the advancement of machine learning, patent analysis has evolved from manual labor to tasks assisted by automatic tools. However, there still lacks an unified agent that assists every aspect of patent analysis, from patent reading to core chemical identification. Leveraging the capabilities of Large Language Models (LLMs) to understand requests and follow instructions, we introduce the $\textbf{first}$ intelligent agent in this domain, $\texttt{PatentAgent}$, poised to advance and potentially revolutionize the landscape of pharmaceutical research. $\texttt{PatentAgent}$ comprises three key end-to-end modules -- $\textit{PA-QA}$, $\textit{PA-Img2Mol}$, and $\textit{PA-CoreId}$ -- that respectively perform (1) patent question-answering, (2) image-to-molecular-structure conversion, and (3) core chemical structure identification, addressing the essential needs of scientists and practitioners in pharmaceutical patent analysis. Each module of $\texttt{PatentAgent}$ demonstrates significant effectiveness with the updated algorithm and the synergistic design of $\texttt{PatentAgent}$ framework. $\textit{PA-Img2Mol}$ outperforms existing methods across CLEF, JPO, UOB, and USPTO patent benchmarks with an accuracy gain between 2.46% and 8.37% while $\textit{PA-CoreId}$ realizes accuracy improvement ranging from 7.15% to 7.62% on PatentNetML benchmark. Our code and dataset will be publicly available.

Machine Learning,Artificial Intelligence,Computation and Language

What problem does this paper attempt to address?

The problem this paper attempts to address is how to efficiently and accurately analyze and extract key information from patents in drug development. Specifically, the paper points out several major issues in current drug patent analysis: 1. **Manual methods are time-consuming and labor-intensive**: Traditional methods such as manual review and keyword search, although considered the gold standard for patent analysis, require scientists to spend a significant amount of time and effort to extract information. They also rely on human experts to interpret complex chemical information, which is costly and inefficient. 2. **Existing computational tools lack an overall solution**: Current computational tools such as text mining and chemical structure exploration can independently accomplish certain tasks but lack a unified standard and integration. This makes coordination between multiple modules difficult, especially for researchers without a computer science background, posing obstacles to the use of these tools. 3. **Inaccurate identification of core compounds**: In drug patents, identifying the core compound structure from hundreds of chemical substances is an important task. However, the accuracy of existing tools for this task remains low, even approaching the level of random guessing. To address these issues, the paper proposes an intelligent agent system named **PatentAgent**, which aims to achieve full-process automated analysis from patent reading to core chemical structure identification by integrating large language models (LLMs) and other advanced computational methods. PatentAgent includes three main modules: 1. **PA-QA**: A question-answering chatbot capable of accurately responding to users' natural language queries about patents. 2. **PA-Img2Mol**: A deep learning model ensemble that can convert chemical structure images into molecular expressions (SMILES). 3. **PA-CoreId**: A machine learning classifier that can identify core chemical structures from various chemical substances. Through the collaborative work of these modules, PatentAgent can significantly improve the accuracy and efficiency of drug patent analysis, reducing the time and effort required from researchers.

$\texttt{PatentAgent}$: Intelligent Agent for Automated Pharmaceutical Patent Analysis

Intelligent System for Automated Molecular Patent Infringement Assessment

DrugAgent: Automating AI-aided Drug Discovery Programming through LLM Multi-Agent Collaboration

Automated patent extraction powers generative modeling in focused chemical spaces

Towards Automated Patent Workflows: AI-Orchestrated Multi-Agent Framework for Intellectual Property Management and Analysis

AI for Patents: A Novel Yet Effective and Efficient Framework for Patent Analysis

PatentNetML: A Novel Framework for Predicting Key Compounds in Patents Using Network Science and Machine Learning

A multi-agent-driven robotic AI chemist enabling autonomous chemical research on demand

Intelligent compilation of patent summaries using machine learning and natural language processing techniques

An Autonomous Large Language Model Agent for Chemical Literature Data Mining

PatentGPT: A Large Language Model for Intellectual Property

Chemical named entity recognition in patents by domain knowledge and unsupervised feature learning

Multi-modal chemical information reconstruction from images and texts for exploring the near-drug space

Annotated Chemical Patent Corpus: A Gold Standard for Text Mining

DrugAgent: Explainable Drug Repurposing Agent with Large Language Model-based Reasoning

Automated Single-Label Patent Classification using Ensemble Classifiers

Pap2Pat: Towards Automated Paper-to-Patent Drafting using Chunk-based Outline-guided Generation

Named entity recognition in chemical patents using ensemble of contextual language models

BioInformatics Agent (BIA): Unleashing the Power of Large Language Models to Reshape Bioinformatics Workflow

ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

DeepPatent: patent classification with convolutional neural networks and word embedding