Abstract:Background: In recent years, research data warehouses moved increasingly into the focus of interest of medical research. Nevertheless, there are only a few center-independent infrastructure solutions available. They aim to provide a consolidated view on medical data from various sources such as clinical trials, electronic health records, epidemiological registries or longitudinal cohorts. The i2b2 framework is a well-established solution for such repositories, but it lacks support for importing and integrating clinical data and metadata. Objectives: The goal of this project was to develop a platform for easy integration and administration of data from heterogeneous sources, to provide capabilities for linking them to medical terminologies and to allow for transforming and mapping of data streams for user-specific views. Methods: A suite of three tools has been developed: the i2b2 Wizard for simplifying administration of i2b2, the IDRT Import and Mapping Tool for loading clinical data from various formats like CSV, SQL, CDISC ODM or biobanks and the IDRT i2b2 Web Client Plugin for advanced export options. The Import and Mapping Tool also includes an ontology editor for rearranging and mapping patient data and structures as well as annotating clinical data with medical terminologies, primarily those used in Germany (ICD-10-GM, OPS, ICD-O, etc.). Results: With the three tools functional, new i2b2-based research projects can be created, populated and customized to researcher's needs in a few hours. Amalgamating data and metadata from different databases can be managed easily. With regards to data privacy a pseudonymization service can be plugged in. Using common ontologies and reference terminologies rather than project-specific ones leads to a consistent understanding of the data semantics. Conclusions: i2b2's promise is to enable clinical researchers to devise and test new hypothesis even without a deep knowledge in statistical programing. The approach presented here has been tested in a number of scenarios with millions of observations and tens of thousands of patients. Initially mostly observant, trained researchers were able to construct new analyses on their own. Early feedback indicates that timely and extensive access to their "own" data is appreciated most, but it is also lowering the barrier for other tasks, for instance checking data quality and completeness (missing data, wrong coding).

The Information Retrieval Experiment Platform

Overview of EIREX 2010: Computing

Overview of EIREX 2011: Crowdsourcing

repro_eval: A Python Interface to Reproducibility Measures of System-oriented IR Experiments

Cheap IR Evaluation: Fewer Topics, No Relevance Judgements, and Crowdsourced Assessments

Advancing Trace Recovery Evaluation - Applied Information Retrieval in a Software Engineering Context

Integrated Data Repository Toolkit (IDRT). A Suite of Programs to Facilitate Health Analytics on Heterogeneous Medical Data

Large-scale information retrieval in software engineering -- an experience report from industrial application

Establishing an Online Access Panel for Interactive Information Retrieval Research

Evaluating Temporal Persistence Using Replicability Measures

LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval

ir_explain: a Python Library of Explainable IR Methods

tieval: An Evaluation Framework for Temporal Information Extraction Systems

Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations

Let's measure run time! Extending the IR replicability infrastructure to include performance aspects

Team IELAB at TREC Clinical Trial Track 2023: Enhancing Clinical Trial Retrieval with Neural Rankers and Large Language Models

InPars Toolkit: A Unified and Reproducible Synthetic Data Generation Pipeline for Neural Information Retrieval

SynDL: A Large-Scale Synthetic Test Collection for Passage Retrieval

Methodology for identifying study sites in scientific corpus

Evaluation of Temporal Change in IR Test Collections

Turkish Text Retrieval Experiments Using Lemur Toolkit