Novel Pipeline for Diagnosing Acute Lymphoblastic Leukemia Sensitive to Related Biomarkers

Amirhossein Askari-Farsangi,Ali Sharifi-Zarchi,Mohammad Hossein Rohban
2023-07-11
Abstract:Acute Lymphoblastic Leukemia (ALL) is one of the most common types of childhood blood cancer. The quick start of the treatment process is critical to saving the patient's life, and for this reason, early diagnosis of this disease is essential. Examining the blood smear images of these patients is one of the methods used by expert doctors to diagnose this disease. Deep learning-based methods have numerous applications in medical fields, as they have significantly advanced in recent years. ALL diagnosis is not an exception in this field, and several machine learning-based methods for this problem have been proposed. In previous methods, high diagnostic accuracy was reported, but our work showed that this alone is not sufficient, as it can lead to models taking shortcuts and not making meaningful decisions. This issue arises due to the small size of medical training datasets. To address this, we constrained our model to follow a pipeline inspired by experts' work. We also demonstrated that, since a judgement based on only one image is insufficient, redefining the problem as a multiple-instance learning problem is necessary for achieving a practical result. Our model is the first to provide a solution to this problem in a multiple-instance learning setup. We introduced a novel pipeline for diagnosing ALL that approximates the process used by hematologists, is sensitive to disease biomarkers, and achieves an accuracy of 96.15%, an F1-score of 94.24%, a sensitivity of 97.56%, and a specificity of 90.91% on ALL IDB 1. Our method was further evaluated on an out-of-distribution dataset, which posed a challenging test and had acceptable performance. Notably, our model was trained on a relatively small dataset, highlighting the potential for our approach to be applied to other medical datasets with limited data availability.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the early diagnosis of acute lymphoblastic leukemia (ALL). Specifically, it focuses on how to effectively diagnose ALL through blood smear images, which is a common blood cancer in children. Early diagnosis is crucial for treatment, so developing an accurate and reliable diagnostic method is of great significance. The paper points out that although previous studies have proposed several machine learning-based methods for ALL diagnosis and reported high diagnostic accuracy, these methods often rely on shortcuts in the dataset rather than making decisions based on medically meaningful features. This phenomenon is especially evident in small-scale medical datasets. To overcome this challenge, the authors propose a new pipeline model that not only improves diagnostic accuracy but also ensures that the model can identify biomarkers related to the disease, thereby making more meaningful judgments. The main innovations of the model include: 1. **Multi-instance learning framework**: Since a single image is insufficient for an accurate diagnosis, the authors redefine the problem as a multi-instance learning problem, making the final diagnosis by analyzing multiple images from the same patient. 2. **Simulating expert workflow**: The model's design is inspired by the workflow of hematologists, including four main steps: detecting white blood cells, analyzing each cell, summarizing results, and making the final decision. 3. **Sensitive to biomarkers**: Through special training methods, the model can identify and rely on key biomarkers (such as blast cells), thereby improving the reliability and accuracy of the diagnosis. Experimental results show that the model achieved an accuracy of 96.15% and an F1 score of 94.24% on the ALL IDB 1 dataset, and also performed well when handling out-of-distribution datasets. Additionally, the model's diagnostic ability significantly decreased after removing blast cells, further validating its sensitivity to key biomarkers. In summary, this study proposes a new method for ALL diagnosis that not only improves diagnostic accuracy but also enhances the model's reliability and interpretability, providing strong support for clinical applications.