Towards Enabling FAIR Dataspaces Using Large Language Models

Benedikt T. Arnold,Johannes Theissen-Lipp,Diego Collarana,Christoph Lange,Sandra Geisler,Edward Curry,Stefan Decker

2024-03-19

Abstract:Dataspaces have recently gained adoption across various sectors, including traditionally less digitized domains such as culture. Leveraging Semantic Web technologies helps to make dataspaces FAIR, but their complexity poses a significant challenge to the adoption of dataspaces and increases their cost. The advent of Large Language Models (LLMs) raises the question of how these models can support the adoption of FAIR dataspaces. In this work, we demonstrate the potential of LLMs in dataspaces with a concrete example. We also derive a research agenda for exploring this emerging field.

Computation and Language

What problem does this paper attempt to address?

The paper primarily explores how to leverage Large Language Models (LLMs) to promote the development of FAIR data spaces. Specifically: 1. **Research Background and Challenges**: - Data spaces have been widely applied in various fields, especially in areas traditionally less digitized, such as the cultural sector. - Although semantic web technologies help achieve FAIR principles, their complexity poses significant barriers to the adoption of data spaces and increases costs. 2. **Application Potential of LLMs**: - Researchers demonstrated the potential applications of LLMs (using GPT-4 as an example) in data spaces, including tasks such as extending semantic metadata schemas, creating instances, and understanding semantic data. - Practical examples illustrate how LLMs can simplify these tasks and enhance the findability, accessibility, interoperability, and reusability (FAIR principles) of data. 3. **Research Agenda**: - A series of research directions are proposed to better utilize LLMs to support FAIR principles in data spaces, including: - Designing interactive or automated assistance systems; - Model fine-tuning and prompt engineering; - Integration of Knowledge Graphs (KGs) and LLMs; - Data sovereignty issues with open models; - Energy efficiency and latency optimization; - Security considerations. Overall, this paper aims to explore how LLMs can overcome existing technical barriers, thereby reducing the cost and technical threshold for achieving FAIR data spaces.

Towards Enabling FAIR Dataspaces Using Large Language Models

An Interdisciplinary Outlook on Large Language Models for Scientific Research

Large Language Models and Knowledge Graphs: Opportunities and Challenges

FAIR Enough: How Can We Develop and Assess a FAIR-Compliant Dataset for Large Language Models' Training?

Making Metadata More FAIR Using Large Language Models

Data-Centric Financial Large Language Models

Towards Federated Large Language Models: Motivations, Methods, and Future Directions

Materials science in the era of large language models: a perspective

Fairness in Large Language Models in Three Hours

Towards a Middleware for Large Language Models

Socially Responsible Data for Large Multilingual Language Models

Apprentices to Research Assistants: Advancing Research with Large Language Models

How should the advent of large language models affect the practice of science?

Towards Efficient Large Language Models for Scientific Text: A Review

Data-Centric AI in the Age of Large Language Models

Fairness in Large Language Models: A Taxonomic Survey

Enhancing Large Language Models with Climate Resources

To prompt or not to prompt: Navigating the use of large language models for integrating and modeling heterogeneous data

Data Augmentation using Large Language Models: Data Perspectives, Learning Paradigms and Challenges

When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

Federated Large Language Model: Solutions, Challenges and Future Directions