Abstract:The NSF-funded Robust Epidemic Surveillance and Modeling (RESUME) project successfully convened a workshop entitled "High-performance computing and large-scale data management in service of epidemiological modeling" at the University of Chicago on May 1-2, 2023. This was part of a series of workshops designed to foster sustainable and interdisciplinary co-design for predictive intelligence and pandemic prevention. The event brought together 31 experts in epidemiological modeling, high-performance computing (HPC), HPC workflows, and large-scale data management to develop a shared vision for capabilities needed for computational epidemiology to better support pandemic prevention. Through the workshop, participants identified key areas in which HPC capabilities could be used to improve epidemiological modeling, particularly in supporting public health decision-making, with an emphasis on HPC workflows, data integration, and HPC access. The workshop explored nascent HPC workflow and large-scale data management approaches currently in use for epidemiological modeling and sought to draw from approaches used in other domains to determine which practices could be best adapted for use in epidemiological modeling. This report documents the key findings and takeaways from the workshop.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to improve epidemiological modeling through high - performance computing (HPC) and large - scale data management, especially to support public health decision - making. Specifically, the paper explores the following key areas: 1. **Evaluating current computational resources for epidemiological modeling**: - **Advantages**: Many researchers can access large - scale computational resources and are able to effectively utilize local and cloud computing resources for efficient computing. These systems usually have large storage capacities and can handle the amount of data required for large - scale epidemiological modeling. Some research teams have also developed specialized software tools to interact with these systems. - **Disadvantages**: The data locality problem affects the efficiency of data analysis in some settings, while other systems face limitations due to job constraints or the use of multiple versions of job - scheduling tools. Switching from one HPC system to another usually encounters difficulties, mainly due to the understanding of performance differences, the management of technical debt, and the lack of appropriate documentation and training. Many users report difficulties in efficiently using resources and adjusting the system for optimal performance. In addition, areas that could benefit from automation, such as hyper - parameter checking and model calibration, usually require manual handling, increasing the time and complexity of the modeling process. - **Capabilities in an ideal world**: In an ideal world, a language - independent and highly automated system would allow for easy collaboration, focusing on science rather than computing. This system would include a comprehensive model library, accompanied by code, documentation, and parameter repositories. It would promote model reuse, providing a pre - rated model catalog to meet different needs. Key functions of the system would include modular problem packaging, a series of rapidly deployable calibration models, and clear parameter sources. The system would also promote model comparison, integrate HPC - friendly calibration capabilities, and provide shared - result visualization tools. Automation would handle error checking, task termination and retries, and provide a feedback loop for performance evaluation. High user - friendliness would allow for easy extraction of detailed modeling outputs and the ability to switch between different model formulations. Other attributes would include code containerization, an interactive cloud system, continuous integration for HPC resources, and a software abstraction layer for flexibility and extensive testing. 2. **Evaluating current data practices**: - **Advantages**: Many current practices demonstrate the effective use and management of diverse data sets. Public data is often used, and there are some well - structured practices for cleaning and preparing this data for further processing. For sensitive data, secure enclaves have been developed to store and analyze data while protecting privacy and confidentiality. Other data practices include establishing cooperation agreements to access proprietary data. Automated processes have been established to handle routine data tasks, such as downloading. Access to HPC resources and dedicated queues is considered an advantage. There are also some promising efforts to develop better data tools, such as real - time data summaries and quality - checking systems. - **Disadvantages**: Data localization is a major problem because data is usually too large or too sensitive to move, which may impede access and computing. Some parts of the data pipeline lack automation, increasing manual labor and potential errors. Several participants reported the need for more transparent and user - friendly tools and simple methods to explore data for model development. Problems related to tracking data and model parameter sources, data cleaning, and interpretation are common. Data sharing is also a problem, with privacy issues and academic competition often being limiting factors. - **Capabilities in an ideal world**: In an ideal world, data storage and access would be secure, seamless, and require minimal user attention while adapting to specific requirements, such as protected health information. Intentional databases would simplify data management, and the cleaning, validation, and quality - control processes would be automated to improve efficiency. Model outputs would be standardized, tracked, and made public in a FAIR (Findable, Accessible, Interoperable, Reusable) manner, allowing different stakeholders to reuse data. Standardized data APIs would facilitate the coordination of data from various sources, and automated data retrieval would accelerate access speed. Clear version control and provenance would ensure data reliability. The system would make data available where needed, whether stored in a database or as a flat file. APIs would manage access to protected data according to user permissions. Interaction with HPC resources would become easier, with low - barrier data preparation, alternatives to the file system, simplified authentication, and the ability to visualize HPC outputs.

NSF RESUME HPC Workshop: High-Performance Computing and Large-Scale Data Management in Service of Epidemiological Modeling

Developing Distributed High-performance Computing Capabilities of an Open Science Platform for Robust Epidemic Analysis

Redefining pandemic preparedness: Multidisciplinary insights from the CERP modelling workshop in infectious diseases, workshop report

A Regionally Tailored Epidemiological Forecast and Monitoring Program to Guide a Healthcare System in the COVID-19 Pandemic

Beyond COVID-19: Network science and sustainable exit strategies

Future of Pandemic Prevention and Response CCC Workshop Report

A multiscale modeling framework for Scenario Modeling: Characterizing the heterogeneity of the COVID-19 epidemic in the US

A meta-modeling framework in public health emergency management

Real-Time Epidemiology and Acute Care Need Monitoring and Forecasting for COVID-19 via Bayesian Sequential Monte Carlo-Leveraged Transmission Models

An inaugural forum on epidemiological modeling for public health stakeholders in Arizona

Evaluation of the US COVID-19 Scenario Modeling Hub for informing pandemic response under uncertainty

Harnessing Big Data for Precision Medicine: Infrastructures and Applications.

Leveraging Geospatial Information to address Space Epidemiology through Multi$\unicode{x2013}$omics $\unicode{x2013}$ Report of an Interdisciplinary Workshop

Modeling and Optimizing the Public-Health Infrastructure for Emergency Response

Scenario Design for Infectious Disease Projections: Integrating Concepts from Decision Analysis and Experimental Design

Building Resilience to Climate Driven Extreme Events with Computing Innovations: A Convergence Accelerator Report

HMES: A Scalable Human Mobility and Epidemic Simulation System with Fast Intervention Modeling

Mapping Incidence and Prevalence Peak Data for SIR Forecasting Applications

Multiple models for outbreak decision support in the face of uncertainty

A Survey of the Use of Modeling, Simulation, Visualization, and Mapping in Public Health Emergency Operations Centers during the COVID-19 Pandemic

Training-based Workforce Development in Advanced Computing for Research and Education (Acore)