Domain-invariant Representation Learning via Segment Anything Model for Blood Cell Classification

Yongcheng Li,Lingcong Cai,Ying Lu,Cheng Lin,Yupeng Zhang,Jingyan Jiang,Genan Dai,Bowen Zhang,Jingzhou Cao,Xiangzhong Zhang,Xiaomao Fan
2024-08-14
Abstract:Accurate classification of blood cells is of vital significance in the diagnosis of hematological disorders. However, in real-world scenarios, domain shifts caused by the variability in laboratory procedures and settings, result in a rapid deterioration of the model's generalization performance. To address this issue, we propose a novel framework of domain-invariant representation learning (DoRL) via segment anything model (SAM) for blood cell classification. The DoRL comprises two main components: a LoRA-based SAM (LoRA-SAM) and a cross-domain autoencoder (CAE). The advantage of DoRL is that it can extract domain-invariant representations from various blood cell datasets in an unsupervised manner. Specifically, we first leverage the large-scale foundation model of SAM, fine-tuned with LoRA, to learn general image embeddings and segment blood cells. Additionally, we introduce CAE to learn domain-invariant representations across different-domain datasets while mitigating images' artifacts. To validate the effectiveness of domain-invariant representations, we employ five widely used machine learning classifiers to construct blood cell classification models. Experimental results on two public blood cell datasets and a private real dataset demonstrate that our proposed DoRL achieves a new state-of-the-art cross-domain performance, surpassing existing methods by a significant margin. The source code can be available at the URL (<a class="link-external link-https" href="https://github.com/AnoK3111/DoRL" rel="external noopener nofollow">this https URL</a>).
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the issue of domain adaptation in blood cell classification. Specifically, blood cell classification is of great significance in clinical diagnosis as it can help identify and monitor various blood diseases. However, in practical applications, differences in laboratory procedures and settings lead to domain discrepancies between different datasets (i.e., domain adaptation), which can severely affect the generalization performance of models. To solve this problem, the research team proposed a new framework—Domain-invariant Representation Learning (DoRL), which extracts domain-invariant features by combining LoRA-optimized Segment Anything Model (LoRA-SAM) with Cross-domain Autoencoder (CAE). The main contributions of DoRL include: 1. Introducing the large-scale foundational model SAM for blood cell image segmentation and utilizing LoRA for fine-tuning to improve segmentation performance. 2. Proposing a new framework that combines LoRA-SAM and CAE to extract domain-invariant features from blood cell datasets of different domains in an unsupervised manner, while eliminating domain-specific artifacts in the images. 3. Experimental results show that DoRL significantly outperforms existing methods in cross-domain classification performance on two public datasets and one private real-world dataset. In summary, this study aims to enhance the generalization ability of models in blood cell classification tasks by proposing a new method, especially in the presence of data from different domains.