LLM-based Extraction of Contradictions from Patents

Stefan Trapp,Joachim Warschat
2024-03-21
Abstract:Already since the 1950s TRIZ shows that patents and the technical contradictions they solve are an important source of inspiration for the development of innovative products. However, TRIZ is a heuristic based on a historic patent analysis and does not make use of the ever-increasing number of latest technological solutions in current patents. Because of the huge number of patents, their length, and, last but not least, their complexity there is a need for modern patent retrieval and patent analysis to go beyond keyword-oriented methods. Recent advances in patent retrieval and analysis mainly focus on dense vectors based on neural AI Transformer language models like Google BERT. They are, for example, used for dense retrieval, question answering or summarization and key concept extraction. A research focus within the methods for patent summarization and key concept extraction are generic inventive concepts respectively TRIZ concepts like problems, solutions, advantage of invention, parameters, and contradictions. Succeeding rule-based approaches, finetuned BERT-like language models for sentence-wise classification represent the state-of-the-art of inventive concept extraction. While they work comparatively well for basic concepts like problems or solutions, contradictions - as a more complex abstraction - remain a challenge for these models. This paper goes one step further, as it presents a method to extract TRIZ contradictions from patent texts based on Prompt Engineering using a generative Large Language Model (LLM), namely OpenAI's GPT-4. Contradiction detection, sentence extraction, contradiction summarization, parameter extraction and assignment to the 39 abstract TRIZ engineering parameters are all performed in a single prompt using the LangChain framework. Our results show that "off-the-shelf" GPT-4 is a serious alternative to existing approaches.
Computation and Language
What problem does this paper attempt to address?
The topic discussed in this paper is how to effectively extract technical contradictions (TRIZ contradictions) from patent texts. Traditional patent retrieval and analysis methods are based on keywords, but as the number, length, and complexity of patents increase, this method becomes insufficient. The paper points out that although there are some methods based on artificial intelligence (AI) and Transformer language models (such as BERT) to identify contradictions in patents, it is still a challenge. The paper proposes a new approach that utilizes large language models (LLM), especially OpenAI's GPT-4, to extract TRIZ contradictions through prompt engineering. The study used the existing patent dataset "PaGAN" to demonstrate the ability of GPT-4 to extract TRIZ contradictions in the "background" section of patents from the United States Patent and Trademark Office (USPTO). By comparing the results of GPT-4 with the annotated sentences from PaGAN, the paper shows the performance of GPT-4 in extracting contradictions, achieving a high F1 score of 0.93. The paper also introduces existing techniques such as rule-based NLP methods and BERT fine-tuning models, as well as complex multi-stage approaches (such as PaTRIZ) to extract innovative concepts, especially contradictions. Although PaTRIZ performs well in certain aspects, it still faces challenges in extracting contradictions. In contrast, the untuned GPT-4 model has been proven to be a viable alternative. Overall, the paper aims to address how to utilize the latest AI technologies, particularly LLM, to automatically detect and extract technical contradictions from patent literature in a more effective manner, overcoming the limitations of traditional methods.