LQ-Adapter: ViT-Adapter with Learnable Queries for Gallbladder Cancer Detection from Ultrasound Image

Chetan Madan,Mayuna Gupta,Soumen Basu,Pankaj Gupta,Chetan Arora
2024-11-30
Abstract:We focus on the problem of Gallbladder Cancer (GBC) detection from Ultrasound (US) images. The problem presents unique challenges to modern Deep Neural Network (DNN) techniques due to low image quality arising from noise, textures, and viewpoint variations. Tackling such challenges would necessitate precise localization performance by the DNN to identify the discerning features for the downstream malignancy prediction. While several techniques have been proposed in the recent years for the problem, all of these methods employ complex custom architectures. Inspired by the success of foundational models for natural image tasks, along with the use of adapters to fine-tune such models for the custom tasks, we investigate the merit of one such design, ViT-Adapter, for the GBC detection problem. We observe that ViT-Adapter relies predominantly on a primitive CNN-based spatial prior module to inject the localization information via cross-attention, which is inefficient for our problem due to the small pathology sizes, and variability in their appearances due to non-regular structure of the malignancy. In response, we propose, LQ-Adapter, a modified Adapter design for ViT, which improves localization information by leveraging learnable content queries over the basic spatial prior module. Our method surpasses existing approaches, enhancing the mean IoU (mIoU) scores by 5.4%, 5.8%, and 2.7% over ViT-Adapters, DINO, and FocalNet-DINO, respectively on the US image-based GBC detection dataset, and establishing a new state-of-the-art (SOTA). Additionally, we validate the applicability and effectiveness of LQ-Adapter on the Kvasir-Seg dataset for polyp detection from colonoscopy images. Superior performance of our design on this problem as well showcases its capability to handle diverse medical imaging tasks across different datasets. Code is released at <a class="link-external link-https" href="https://github.com/ChetanMadan/LQ-Adapter" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: detecting Gallbladder Cancer (GBC) from ultrasound images. This problem poses unique challenges because ultrasound images are usually of low quality, such as noise, texture, and view - angle changes, which make it difficult for modern deep neural network (DNN) techniques to accurately locate and identify. Specifically: 1. **Image Quality Problems**: Due to noise, artifacts (such as shadows or echo textures), and view - angle changes caused by hand - held sensors, ultrasound images have low quality. 2. **Small and Irregular Lesion Areas**: Gallbladder cancer usually occupies a small part of the image, and its appearance varies due to the irregularity of the pathological structure. 3. **Limitations of Existing Methods**: Although several existing methods have proposed solutions, they rely on complex custom architectures and are difficult to apply to other datasets or tasks. To solve these problems, the authors propose an improved adapter design - LQ - Adapter (Learnable Queries Adapter), which enhances the spatial prior module of ViT - Adapter by introducing learnable content queries, thereby improving the quality of location information. LQ - Adapter outperforms existing methods on the GBC detection task. On the GBC detection dataset of US images, the mIoU scores are increased by 5.4%, 5.8% and 2.7% respectively, surpassing ViT - Adapter, DINO and FocalNet - DINO. In addition, LQ - Adapter also demonstrates its effectiveness in polyp detection in colonoscopic images on the Kvasir - Seg dataset, proving its wide applicability in different medical imaging tasks. ### Formula Representation To ensure the correctness and readability of the formulas, some of the key formulas involved in the paper are presented in Markdown format as follows: - **Self - Attention Mechanism**: \[ \text{Attention}(Q, K, V)=\text{Softmax}\left(\frac{QK^{T}}{\sqrt{d_{k}}}\right)V \] where \(Q\), \(K\), \(V\) are query, key and value respectively, which are obtained by linear transformation of feature \(X\), and \(d_{k}\) is the dimension. - **Spatial Prior Injector**: \[ F_{i}^{vit}=F_{i}^{vit}+\gamma_{i}\cdot\text{Attention}(\text{norm}(F_{i}^{vit}), \text{norm}(F_{i}^{sp})) \] - **Cross - Attention in the Extraction Module**: \[ F_{i}^{sp}=\text{Attention}(\text{norm}(F_{i}^{vit}), \text{norm}(F_{i}^{sp})) \] \[ F_{i + 1}^{sp}=F_{i}^{sp}+\text{FFN}(\text{norm}(F_{i}^{sp})) \] \[ LQ_{i}=\text{Attention}(\text{norm}(LQ_{i}), \text{norm}(F_{i}^{vit})) \] \[ LQ_{i + 1}=LQ_{i}+\text{Attention}(\text{norm}(F_{i + 1}^{sp}), \text{norm}(LQ_{i})) \] These formulas show how LQ - Adapter enhances the model's location ability through the cross - attention mechanism and learnable queries, so as to more effectively handle the Gallbladder Cancer detection task in ultrasound images.