Long deletion signatures in repetitive genomic regions track somatic evolution and enable sensitive detection of microsatellite instability

Qingli Guo,Jacob Househam,Eszter Lakatos,Salpie Nowinski,Ibrahim Al Bakir,Heather Grant,Vickna Balarajah,Christine S. Hughes,Luis Zapata,Hemant Kocher,Andrea Sottoriva,Ann-Marie Baker,Ville Mustonen,Trevor Graham
DOI: https://doi.org/10.1101/2024.10.03.616572
2024-10-04
Abstract:Deficiency in the mismatch repair system (MMRd) causes microsatellite instability (MSI) in cancers and determines eligibility for immunotherapy. Here, we show that MMRd tumours harbour long-deletion signatures (≥2-5+ base pairs deleted in repetitive regions), which provide new insights into MSI evolution and enable sensitive MSI detection particularly in challenging clinical samples. Long deletions, accumulated through stepwise DNA slippage errors, are significantly more prevalent in metastatic MMRd tumours compared to primary tumours. Importantly, we show that long-deletion signatures harbour features that are distinct from background noise, making them robustly detectable even in shallow whole genome sequencing (sWGS, ~0.1X coverage) of formalin-fixed samples. We constructed a machine learning classifier that uses these distinct features to detect Microsatellite Instability in LOw-quality (MILO) samples. MILO achieved 100% accuracy in detecting MSI in sWGS data with only 2%-15% tumour purity and demonstrated promise in identifying MMRd clones in precancerous intestinal lesions. We propose that MILO could be clinically used for the sensitive monitoring of MMRd cancer evolution from early to late stages, using minimal sequencing data from both archival and fresh-frozen samples with low tumour content.
Bioinformatics
What problem does this paper attempt to address?