Abstract:Background: Recent advancements in computational pathology have introduced deep learning methods to predict genomic, transcriptomic and molecular biomarkers from routine histology whole slide images (WSIs) for cancer diagnosis, prognosis, and treatment. However, existing methods often overlook the critical role of co-dependencies among biomarker statuses during training and inference. We hypothesize that this oversight results in models that predict the combined effect of multiple interdependent biomarkers rather than individual statuses independently, akin to attributing the quality of an orchestral symphony to a single instrument, highlighting limitations of current predictors. Methods: Using large datasets (n = 8,221 patients), we conducted statistical co-dependence testing to demonstrate significant interdependencies among biomarker statuses in training datasets. Following standard protocols, we trained two machine learning models to predict biomarkers from WSIs achieving or matching state-of-the-art predictive performance. We then employed permutation testing and stratification analysis to evaluate their predictive quality based on the principle of conditional independence, i.e., if a model accurately captures the phenotypic influence of a specific biomarker independent of other biomarkers, its performance should remain consistent across subgroups of patients stratified by other biomarkers, aligning with its overall performance on the entire dataset. Findings: Our statistical analysis reveals significant interdependencies among biomarkers, reflecting expected co-occurrence and mutual exclusivity patterns influenced by pathological and biological processes that are consistent across datasets, as well as sampling artefacts that can be different across datasets. Our results indicate that the predictive quality of an image-based predictor for a biomarker is contingent on the status of other biomarkers, revealing that models capture aggregated influences rather than predicting individual statuses independently. For example, mutation predictions are confounded by the overall tumour mutation burden. We also show that, due to the presence of such correlations, deep learning models may not offer significant advantages in predicting certain biomarkers in comparison to simply using pathologist-assigned grades for their prediction. Interpretation: We show that current deep learning models in computational pathology fall short in isolating individual biomarker effects, leading to confounded and less precise predictions. Our findings suggest revisiting model training protocols to recognize and adjust for biomarker interdependencies at all development stages, from problem definition to usage guidelines. This involves selecting diverse datasets to reflect clinical heterogeneity, defining prediction variables or grouping patients based on co-dependencies, designing models to disentangle complex relationships, and stringent stratification testing. Clinically, failure to account for interdependencies may lead to suboptimal decisions, necessitating appropriate usage guidelines for predictive models.

Hidden dependencies between class imbalance and difficulty of learning for bioinformatics datasets

Towards Deeper Insights into Deep Learning from Imbalanced Data.

Buyer Beware: confounding factors and biases abound when predicting omics-based biomarkers from histological images

Imbalance in Regression Datasets

Bayes Imbalance Impact Index: A Measure of Class Imbalanced Dataset for Classification Problem

Comparative Analysis of Data Preprocessing Methods, Feature Selection Techniques and Machine Learning Models for Improved Classification and Regression Performance on Imbalanced Genetic Data

What makes multi-class imbalanced problems difficult? An experimental study

Handling Inter-class and Intra-class Imbalance in Class-imbalanced Learning

Class-Wise Difficulty-Balanced Loss for Solving Class-Imbalance

Rethinking Class Imbalance in Machine Learning

Impact of Leakage on Data Harmonization in Machine Learning Pipelines in Class Imbalance Across Sites

Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data Imbalance

A Theoretical Analysis of the Learning Dynamics under Class Imbalance

Delving into Deep Imbalanced Regression

A hybrid ensemble and evolutionary algorithm for imbalanced classification and its application on bioinformatics

The Hidden Influence of Latent Feature Magnitude When Learning with Imbalanced Data

Characterizing the impacts of dataset imbalance on single-cell data integration

Effects of Class Imbalance Countermeasures on Interpretability

Dataset Difficulty and the Role of Inductive Bias

Handling imbalanced medical datasets: review of a decade of research

RankSim: Ranking Similarity Regularization for Deep Imbalanced Regression