vitaLITy 2: Reviewing Academic Literature Using Large Language Models

Hongye An,Arpit Narechania,Emily Wall,Kai Xu
2024-08-24
Abstract:Academic literature reviews have traditionally relied on techniques such as keyword searches and accumulation of relevant back-references, using databases like Google Scholar or IEEEXplore. However, both the precision and accuracy of these search techniques is limited by the presence or absence of specific keywords, making literature review akin to searching for needles in a haystack. We present vitaLITy 2, a solution that uses a Large Language Model or LLM-based approach to identify semantically relevant literature in a textual embedding space. We include a corpus of 66,692 papers from 1970-2023 which are searchable through text embeddings created by three language models. vitaLITy 2 contributes a novel Retrieval Augmented Generation (RAG) architecture and can be interacted with through an LLM with augmented prompts, including summarization of a collection of papers. vitaLITy 2 also provides a chat interface that allow users to perform complex queries without learning any new programming language. This also enables users to take advantage of the knowledge captured in the LLM from its enormous training corpus. Finally, we demonstrate the applicability of vitaLITy 2 through two usage scenarios. vitaLITy 2 is available as open-source software at <a class="link-external link-https" href="https://vitality-vis.github.io" rel="external noopener nofollow">this https URL</a>.
Human-Computer Interaction
What problem does this paper attempt to address?
The paper aims to address several key issues in academic literature retrieval: 1. **Efficient Retrieval of Relevant Literature**: Traditional methods (such as keyword search) are limited in precision and accuracy when retrieving a large volume of academic literature, as they depend on the presence of specific keywords, making literature retrieval like finding a needle in a haystack. 2. **Comprehensive Visualization of Literature Relationships**: Existing literature review methods struggle to capture subtle connections between documents, affecting the effective extraction of relevant information. 3. **Literature Summarization**: Faced with a large amount of literature, traditional literature review methods are often not systematic enough and lack sufficient coverage of evidence. Specifically, the paper proposes the VITA LIT Y2 system, an open-source visualization tool based on large language models (LLM) for literature retrieval. It uses text embedding technology to identify semantically related documents in the text embedding space and provides a corpus of 66,692 papers from 1970 to 2023. VITA LIT Y2 introduces a novel Retrieval-Augmented Generation (RAG) architecture, supporting interaction with the system through natural language without the need to learn new programming languages. Additionally, users can utilize the system to summarize a series of papers and pose context-related questions. The application of VITA LIT Y2 is demonstrated through two use cases.