Abstract:Cancer early detection is one of the most critical areas of cancer research, as it offers the greatest potential for improving patient outcomes. The International Alliance for Cancer Early Detection (ACED) is a global partnership of world-leading cancer research institutions in the UK and US, established in 2019 to accelerate and revolutionize research in this field. ACED brings together the expertise of the Canary Center at Stanford University, the University of Cambridge, the Knight Cancer Institute at Oregon Health and Sciences University, University College London, and the University of Manchester, together with Cancer Research UK, with the goal of catalyzing new collaborations and research in the field of early detection. One of the major challenges to enabling international collaborations across different institutions is the ability to collect, share and discover datasets between researchers and institutions. To enable collaboration on innovative data science, fundamental functions include, 1) controlled data sharing mechanisms; 2) structured metadata to enable data exploration and data discovery; and 3) enabling easy computational access to that data. To advance researchers' collaboration, ACED began development of an Integrated Data Platform (IDP). The ACED-IDP is based on the Gen3 software platform developed by the University of Chicago's Center for Translational Data Science. Based on software systems originally developed for the NCI's Genomic Data Commons, Gen3 has been used in several other data projects including The Blood Profiling Atlas in Cancer (BloodPAC), Australian BioCommons and numerous other research data platforms. The unique nature of ACED required unique innovations to be made for the development of the IDP. Each of the member institutions within the alliance has different existing computer infrastructures, separate authentication platforms and heterogeneous data types forming the basis of their research. The IDP sought a cloud-ready strategy, while still being cognizant of the extreme costs associated with cloud egress fees that hamper researchers' ability to download data. Additions were made to the Gen3 platform to allow for hybrid cloud support, allowing on-premises as well as cloud object storage systems to be linked to the platform. This innovation permits each institution to share files using the mechanisms that they see fit. To unify the various datasets, the standard Gen3 schema was replaced with one derived from the Fast Healthcare Interoperability Resources (FHIR) standard. To support import from clinical data sets, tooling to enable import and export Observational Medical Outcomes Partnership (OMOP) data has been integrated. With this platform in place, we hope to advance international collaborations and accelerate early cancer detection research. Citation Format: Brian Walsh, Liam Beckman, JD Burchett, Matthew Peterkort, Jordan Lee, Michael Fitzsimons, Peter Vassilatos, Binam Bajracharya, Craig Barnes, Jawad Qureshi, Robert Grossman, Carrie Yakura, Yaozhi Lu, Sarah Burge, Daniel Kelberman, Erin Watson, Kyle Ellrott. An integrated data platform to support the international alliance for cancer early detection [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular s); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl) nr 3558.

A new AI-assisted data standard accelerates interoperability in biomedical research

Automated Harmonization and Large-Scale Integration of Heterogeneous Biomedical Sample Metadata Using Large Language Models

Speaking the Same Language: Leveraging LLMs in Standardizing Clinical Data for AI

Unlocking Historical Clinical Trial Data with ALIGN: A Compositional Large Language Model System for Medical Coding

Achieving Inclusive Healthcare through Integrating Education and Research with AI and Personalized Curricula

A Natural Language Processing Approach to Support Biomedical Data Harmonization: Leveraging Large Language Models

A Standardized Clinical Data Harmonization Pipeline for Scalable AI Application Deployment (FHIR-DHP): Validation and Usability Study

Redefining Health Care Data Interoperability: Empirical Exploration of Large Language Models in Information Exchange

AI-readiness for Biomedical Data: Bridge2AI Recommendations

Synthetic Data Generation in Hematology - Paving the Way for OMOP and FHIR Integration

CDEMapper: Enhancing NIH Common Data Element Normalization using Large Language Models

New implementation of data standards for AI in oncology. Experience from the EuCanImage project.

Novel Development of LLM Driven mCODE Data Model for Improved Clinical Trial Matching to Enable Standardization and Interoperability in Oncology Research

Call for Data Standardization: Lessons Learned and Recommendations in an Imaging Study

Abstract 3558: An integrated data platform to support the international alliance for cancer early detection

From Planning Stage To FAIR Data: A Practical Metadatasheet For Biomedical Scientists

Enabling artificial intelligence in high acuity medical environments

Semantic Harmonization of Alzheimer's Disease Datasets Using AD-Mapper

FAIR data sharing: The roles of common data elements and harmonization

Northwestern University resource and education development initiatives to advance collaborative artificial intelligence across the learning health system

Unpacking Unstructured Data: A Pilot Study on Extracting Insights from Neuropathological Reports of Parkinson's Disease Patients using Large Language Models