Abstract:When conducting an in vivo experiment, researchers will typically collect a diverse array of qualitative observations and quantitative measurements. Therefore, choosing data that are most relevant for aggregation and tidying is a crucial first step (Fig 1). Care must be taken when combining data from multiple studies to determine which data points are most consistently collected between experiments (especially if different research staff are conducting the work). This may require excluding specific parameters for analysis (like lethargy or animal activity level) which may be more vulnerable to laboratorian bias depending on the specific standardized assessment employed. To reduce experimental confounders, studies intended for aggregation should be conducted under as uniform or standard conditions as possible (with these inclusion criteria explicitly stated within the analysis) [1–3]. As any research scientist will attest, in vivo-generated data is highly heterogeneous, particularly when using outbred species. Variability may be present in baseline (pre-inoculation) animal age, weight, temperature, activity level, blood chemistry, and innate immune response parameters, among others. Inoculation (e.g., infectious dose) and post-inoculation (e.g., specimen collection) variability can also be present. As most studies assessing viral pathogenicity report changes relative to baseline, normalizing raw data to reflect a linear or percentage-based deviation from baseline will typically yield aggregate data with less standard error and greater uniformity, and represents a best practice in the field [4]. Normalization can typically occur before or after aggregation. It is frequently desirable to contextualize in vivo-derived outcomes with genotypic data [5–8]; however, these data must be similarly curated before further analysis, with reliable consensus sequence data available for aggregation and use (Fig 2). Will full-length genetic sequences be assessed, or will specific molecular residues that are known to affect the tested variable be sufficient [9]? Molecular residues are often compensatory in nature; will researchers build new data set columns with anticipated phenotypic outcomes from constellations of specific amino acids at key positions (like predicted receptor binding preference or length of an accessory protein)? If laboratory-generated data will be included, have researchers ensured reproducibility of aggregated experiments performed over time [10], with oversight for potential dual-use research of concern? Considering the scope of information that can be obtained from in vivo, in vitro, and molecular analyses, selecting input data for subsequent processing represents a challenging endeavor.

Heterogeneity of animal experiments and how to deal with it

How Data Heterogeneity Affects Innovating Knowledge and Information in Gene Identification: A Statistical Learning Perspective

Systematic heterogenization for better reproducibility in animal experimentation

Systematic heterogenization revisited: Increasing variation in animal experiments to improve reproducibility?

Reproducibility of animal research in light of biological variation

Improving reproducibility in animal research by splitting the study population into several ‘mini-experiments’

Statistical simulations show that scientists need not increase overall sample size by default when including both sexes in in vivo studies

Inequality, heterogeneity, and chance: Multiple factors and their interactions

Half the price, twice the gain: How to simultaneously decrease animal numbers and increase precision with good experimental design

Data alchemy, from lab to insight: Transforming in vivo experiments into data science gold

What is the optimum design for my animal experiment?

Improving basic and translational science by accounting for litter-to-litter variation in animal models

Heterogeneity of the gut microbiome in mice: guidelines for optimizing experimental design

Statistical Primer: heterogeneity, random- or fixed-effects model analyses?

Effect or Treatment Heterogeneity? Policy Evaluation with Aggregated and Disaggregated Treatments

Lifting the veil off treatment effect heterogeneity

Incorporating sources of correlation between outcomes: An introduction to mixed models

Treatment randomisation at animal or pen level?

Adaptive Experiments Toward Learning Treatment Effect Heterogeneity

Data aggregation can lead to biased inferences in Bayesian linear mixed models and Bayesian analysis of variance.

Simulation methodologies to determine statistical power in laboratory animal research studies