Europe's Largest Research Infrastructure for Curated Medical Data Models with Semantic Annotations
Sarah Riepenhausen,Max Blumenstock,Christian Niklas,Stefan Hegselmann,Philipp Neuhaus,Alexandra Meidt,Cornelia Püttmann,Michael Storck,Matthias Ganzinger,Julian Varghese,Martin Dugas
DOI: https://doi.org/10.1055/s-0044-1786839
IF: 1.8
2024-05-15
Methods of Information in Medicine
Abstract:Background Structural metadata from the majority of clinical studies and routine health care systems is currently not yet available to the scientific community. Objective To provide an overview of available contents in the Portal of Medical Data Models (MDM Portal). Methods The MDM Portal is a registered European information infrastructure for research and health care, and its contents are curated and semantically annotated by medical experts. It enables users to search, view, discuss, and download existing medical data models. Results The most frequent keyword is "clinical trial" ( n = 18,777), and the most frequent disease-specific keyword is "breast neoplasms" ( n = 1,943). Most data items are available in English ( n = 545,749) and German ( n = 109,267). Manually curated semantic annotations are available for 805,308 elements (554,352 items, 58,101 item groups, and 192,855 code list items), which were derived from 25,257 data models. In total, 1,609,225 Unified Medical Language System (UMLS) codes have been assigned, with 66,373 unique UMLS codes. Conclusion To our knowledge, the MDM Portal constitutes Europe's largest collection of medical data models with semantically annotated elements. As such, it can be used to increase compatibility of medical datasets and can be utilized as a large expert-annotated medical text corpus for natural language processing. S.R.: manuscript writing, statistics, revision, (supervision of) data model creation and annotation, research of available data models. M.B.: software development, revision, export of metadata and code. C.N.: revision, supervision of data model creation and annotation, research of available data models. S.H.: software architecture and development. P.N.: software development and supervision thereof, revision, export of metadata and code. A.M.: project management, dissemination concept, writing, and revision. C.P.: data model creation and annotation, research of available data models. M.S.: software development and supervision thereof. M.G.: software development and supervision thereof. J.V.: software development, writing and revision, (supervision of) data model creation and annotation, research of available data models. M.D.: Principal Investigator of MDM portal, conceptualization, selection of data models, supervision of software development, manuscript writing. Received: 21 October 2021 Accepted: 29 March 2024 Article published online: 13 May 2024 © 2024. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/) Georg Thieme Verlag KG Stuttgart · New York
health care sciences & services,computer science, information systems,medical informatics