Classification of cancer pathology reports: a large-scale comparative study

Stefano Martina,Leonardo Ventura,Paolo Frasconi

DOI: https://doi.org/10.1109/JBHI.2020.3005016

2020-06-30

Abstract:We report about the application of state-of-the-art deep learning techniques to the automatic and interpretable assignment of ICD-O3 topography and morphology codes to free-text cancer reports. We present results on a large dataset (more than 80 000 labeled and 1 500 000 unlabeled anonymized reports written in Italian and collected from hospitals in Tuscany over more than a decade) and with a large number of classes (134 morphological classes and 61 topographical classes). We compare alternative architectures in terms of prediction accuracy and interpretability and show that our best model achieves a multiclass accuracy of 90.3% on topography site assignment and 84.8% on morphology type assignment. We found that in this context hierarchical models are not better than flat models and that an element-wise maximum aggregator is slightly better than attentive models on site classification. Moreover, the maximum aggregator offers a way to interpret the classification process.

Machine Learning,Computation and Language,Image and Video Processing

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to automatically and interpretably assign ICD - O3 (International Classification of Diseases for Oncology, 3rd Edition) anatomical location and morphology codes in cancer pathology reports. Specifically, the authors applied the latest deep - learning techniques to handle this task and conducted experiments on a large number of datasets to evaluate the performance of different model architectures in terms of prediction accuracy and interpretability. These pathology reports are written in Italian, from more than ten years of data collection in Tuscan regional hospitals, containing more than 80,000 annotated reports and 1,500,000 unannotated reports. The main objectives of the paper include: 1. **Improve the level of automation**: Reduce the time and resource consumption of manual review of pathology reports through machine - learning techniques, thereby speeding up the definition of cancer cases and supporting public health decision - making. 2. **Enhance classification accuracy**: Compare the performance of different deep - learning models (such as GRU, GRU with attention mechanism, BERT, and CNN) on multi - class classification tasks, especially for the classification of anatomical locations and morphological types. 3. **Strengthen the interpretability of the model**: Explore how to improve the interpretability of model prediction results through model structure design (for example, using a max - aggregator instead of an attention mechanism) so that human experts can further review the automatic classification results. Through these studies, the paper aims to provide an effective tool that can automatically extract key information from free - text pathology reports to support cancer registration and public health surveillance work.

Classification of cancer pathology reports: a large-scale comparative study

Hierarchical Deep Learning Classification of Unstructured Pathology Reports to Automate ICD-O Morphology Grading

Automated Classification of Free-text Pathology Reports for Registration of Incident Cases of Cancer

An Explainable Classification Method Based on Complex Scaling in Histopathology Images for Lung and Colon Cancer

Automatic Classification of Pathology Reports using TF-IDF Features

Language Models for Hierarchical Classification of Radiology Reports With Attention Mechanisms, BERT, and GPT-4

Natural Language Processing to extract SNOMED-CT codes from pathological reports

Classifying Cancer Stage with Open-Source Clinical Large Language Models

Evaluating Methods for Identifying Cancer in Free-Text Pathology Reports Using Various Machine Learning and Data Preprocessing Approaches

Deep-Learning Language-Modeling Approach for Automated, Personalized, and Iterative Radiology-Pathology Correlation

Large Language Models for Automated Synoptic Reports and Resectability Categorization in Pancreatic Cancer

Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning

Large-Scale Deep Learning for Metastasis Detection in Pathology Reports

Generalizable and automated classification of TNM stage from pathology reports with external validation

Pediatric brain tumor classification using digital histopathology and deep learning: evaluation of SOTA methods on a multi-center Swedish cohort

Interpretable Classification from Skin Cancer Histology Slides Using Deep Learning: A Retrospective Multicenter Study

Deep learning for multi-class semantic segmentation enables colorectal cancer detection and classification in digital pathology images

Text mining approach for feature extraction and cartilage disease grade classification using knee MRI radiology reports

Classification of radiology reports by modality and anatomy: A comparative study

A comparative study of large language model-based zero-shot inference and task-specific supervised classification of breast cancer pathology reports

Clinical Concept-Based Radiology Reports Classification Pipeline for Lung Carcinoma