A pathology foundation model for cancer diagnosis and prognosis prediction

Xiyue Wang,Junhan Zhao,Eliana Marostica,Wei Yuan,Jietian Jin,Jiayu Zhang,Ruijiang Li,Hongping Tang,Kanran Wang,Yu Li,Fang Wang,Yulong Peng,Junyou Zhu,Jing Zhang,Christopher R Jackson,Jun Zhang,Deborah Dillon,Nancy U Lin,Lynette Sholl,Thomas Denize,David Meredith,Keith L Ligon,Sabina Signoretti,Shuji Ogino,Jeffrey A Golden,MacLean P Nasrallah,Xiao Han,Sen Yang,Kun-Hsing Yu
DOI: https://doi.org/10.1038/s41586-024-07894-z
IF: 64.8
2024-09-04
Nature
Abstract:Histopathology image evaluation is indispensable for cancer diagnoses and subtype classification. Standard artificial intelligence methods for histopathology image analyses have focused on optimizing specialized models for each diagnostic task1,2. Although such methods have achieved some success, they often have limited generalizability to images generated by different digitization protocols or samples collected from different populations3. Here, to address this challenge, we devised the Clinical Histopathology Imaging Evaluation Foundation (CHIEF) model, a general-purpose weakly supervised machine learning framework to extract pathology imaging features for systematic cancer evaluation. CHIEF leverages two complementary pretraining methods to extract diverse pathology representations: unsupervised pretraining for tile-level feature identification and weakly supervised pretraining for whole-slide pattern recognition. We developed CHIEF using 60,530 whole-slide images spanning 19 anatomical sites. Through pretraining on 44 terabytes of high-resolution pathology imaging datasets, CHIEF extracted microscopic representations useful for cancer cell detection, tumour origin identification, molecular profile characterization and prognostic prediction. We successfully validated CHIEF using 19,491 whole-slide images from 32 independent slide sets collected from 24 hospitals and cohorts internationally. Overall, CHIEF outperformed the state-of-the-art deep learning methods by up to 36.1%, showing its ability to address domain shifts observed in samples from diverse populations and processed by different slide preparation methods. CHIEF provides a generalizable foundation for efficient digital pathology evaluation for patients with cancer.
What problem does this paper attempt to address?