Artificial Intuition: Efficient Classification of Scientific Abstracts

Harsh Sakhrani,Naseela Pervez,Anirudh Ravi Kumar,Fred Morstatter,Alexandra Graddy Reed,Andrea Belz

2024-07-09

Abstract:It is desirable to coarsely classify short scientific texts, such as grant or publication abstracts, for strategic insight or research portfolio management. These texts efficiently transmit dense information to experts possessing a rich body of knowledge to aid interpretation. Yet this task is remarkably difficult to automate because of brevity and the absence of context. To address this gap, we have developed a novel approach to generate and appropriately assign coarse domain-specific labels. We show that a Large Language Model (LLM) can provide metadata essential to the task, in a process akin to the augmentation of supplemental knowledge representing human intuition, and propose a workflow. As a pilot study, we use a corpus of award abstracts from the National Aeronautics and Space Administration (NASA). We develop new assessment tools in concert with established performance metrics.

Artificial Intelligence

What problem does this paper attempt to address?

This paper mainly explores how to effectively classify scientific literature abstracts in a coarse-grained manner, which is a challenging task because abstracts are often information-dense and lack context. The researchers propose a new approach called "Artificial Intuition" to address this problem by generating and appropriately assigning domain-specific coarse labels. They use a large language model (LLM) to provide the necessary metadata, similar to the process of enhancing human intuition, and propose a workflow. In the specific operation, the researchers first use a keyword extraction algorithm to extract key terms from the abstracts, and then use LLM to generate relevant background information for these keywords, clustering these enhanced documents for classification. They use NASA's SBIR project abstracts as a pilot case and develop new evaluation tools combined with standard performance metrics. Two main requirements mentioned in the paper are: (1) create a unified, coarse-grained, non-overlapping classification system suitable for uniquely categorizing a group of documents; (2) develop an unsupervised method that avoids relying on manual annotations while effectively handling the characteristics of scientific text, especially for abstracts. The researchers generate a label space through k-means clustering and propose a coverage measure to evaluate if the labels comprehensively describe the document space. Additionally, they analyze the influence of different clustering numbers on redundancy and coverage, and how to predict labels through threshold selection to achieve high precision and recall. Finally, the paper discusses the potential applications of this approach, including validation on a wider range of document sets, handling long documents, generating multiple labels, and the potential application in business and public policy fields, such as tracking research trends or industry classification through label-generated metadata.

Artificial Intuition: Efficient Classification of Scientific Abstracts

Qlarify: Recursively Expandable Abstracts for Directed Information Retrieval over Scientific Papers

Identifying the Development and Application of Artificial Intelligence in Scientific Text

Large Scale Subject Category Classification of Scholarly Papers with Deep Attentive Neural Networks

Beyond original Research Articles Categorization via NLP

A New Algorithm for the Acquisition of Knowledge from Scientific Literature in Specific Fields Based on Natural Language Comprehension.

An AI aid to the editors. Exploring the possibility of an AI assisted article classification system

Knowledge AI: Fine-tuning NLP Models for Facilitating Scientific Knowledge Extraction and Understanding

Towards an understanding and explanation for mixed-initiative artificial scientific text detection

Scientific intuition inspired by machine learning generated hypotheses

Improving accessibility of scientific research by artificial intelligence—An example for lay abstract generation

REASONS: A benchmark for REtrieval and Automated citationS Of scieNtific Sentences using Public and Proprietary LLMs

ByteScience: Bridging Unstructured Scientific Literature and Structured Data with Auto Fine-tuned Large Language Model in Token Granularity

Using General Large Language Models to Classify Mathematical Documents

Machine Identification of High Impact Research through Text and Image Analysis

Why do you cite? An investigation on citation intents and decision-making classification processes

A deep learning classifier for sentence classification in biomedical and computer science abstracts

Hierarchical Classification of Research Fields in the "Web of Science" Using Deep Learning

External Reasoning: Towards Multi-Large-Language-Models Interchangeable Assistance with Human Feedback

Semantic Analysis for Automated Evaluation of the Potential Impact of Research Articles

NLP meets Materials Science: Quantifying the presentation of materials data in scientific literature