Predicting cancer content in tiles of lung squamous cell carcinoma tumours with validation against pathologist labels

Salma Dammak,Matthew J. Cecchini,Jennifer Coats,Katherina Baranova,Aaron D. Ward
DOI: https://doi.org/10.1016/j.compbiomed.2024.109489
IF: 7.7
2024-12-06
Computers in Biology and Medicine
Abstract:Background A growing body of research is using deep learning to explore the relationship between treatment biomarkers for lung cancer patients and cancer tissue morphology on digitized whole slide images (WSIs) of tumour resections. However, these WSIs typically contain non-cancer tissue, introducing noise during model training. As digital pathology models typically start with splitting WSIs into tiles, we propose a model that can be used to exclude non-cancer tiles from the WSIs of lung squamous cell carcinoma (SqCC) tumours. Methods We obtained 116 WSIs of tumours from 35 different centres from the Cancer Genome Atlas. A pathologist completed or reviewed cancer contours in four regions of interest (ROIs) within each WSIs. We then split the ROIs into tiles labelled with the percentage of cancer tissue within them and trained VGG16 to predict this value, and then we calculated regression error. To measure classification performance and visualize the classification results, we thresholded the predictions and calculated the area under the receiver operating characteristic curve (AUC). Results The model's median regression error was 4% with a standard deviation of 35%. At a cancer threshold of 50%, the model had an AUC of 0.83. False positives tended to be in tissues that surround cancer, tiles with <50% cancer, and areas with high immune activity. False negatives tended to be microtomy defects. Conclusions With further validation for each specific research application, the model we describe in this paper could facilitate the development of more effective research pipelines for predicting treatment biomarkers for lung SqCC.
engineering, biomedical,computer science, interdisciplinary applications,mathematical & computational biology,biology
What problem does this paper attempt to address?