Abstract:We focus on the problem of Gallbladder Cancer (GBC) detection from Ultrasound (US) images. The problem presents unique challenges to modern Deep Neural Network (DNN) techniques due to low image quality arising from noise, textures, and viewpoint variations. Tackling such challenges would necessitate precise localization performance by the DNN to identify the discerning features for the downstream malignancy prediction. While several techniques have been proposed in the recent years for the problem, all of these methods employ complex custom architectures. Inspired by the success of foundational models for natural image tasks, along with the use of adapters to fine-tune such models for the custom tasks, we investigate the merit of one such design, ViT-Adapter, for the GBC detection problem. We observe that ViT-Adapter relies predominantly on a primitive CNN-based spatial prior module to inject the localization information via cross-attention, which is inefficient for our problem due to the small pathology sizes, and variability in their appearances due to non-regular structure of the malignancy. In response, we propose, LQ-Adapter, a modified Adapter design for ViT, which improves localization information by leveraging learnable content queries over the basic spatial prior module. Our method surpasses existing approaches, enhancing the mean IoU (mIoU) scores by 5.4%, 5.8%, and 2.7% over ViT-Adapters, DINO, and FocalNet-DINO, respectively on the US image-based GBC detection dataset, and establishing a new state-of-the-art (SOTA). Additionally, we validate the applicability and effectiveness of LQ-Adapter on the Kvasir-Seg dataset for polyp detection from colonoscopy images. Superior performance of our design on this problem as well showcases its capability to handle diverse medical imaging tasks across different datasets. Code is released at <a class="link-external link-https" href="https://github.com/ChetanMadan/LQ-Adapter" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: detecting Gallbladder Cancer (GBC) from ultrasound images. This problem poses unique challenges because ultrasound images are usually of low quality, such as noise, texture, and view - angle changes, which make it difficult for modern deep neural network (DNN) techniques to accurately locate and identify. Specifically: 1. **Image Quality Problems**: Due to noise, artifacts (such as shadows or echo textures), and view - angle changes caused by hand - held sensors, ultrasound images have low quality. 2. **Small and Irregular Lesion Areas**: Gallbladder cancer usually occupies a small part of the image, and its appearance varies due to the irregularity of the pathological structure. 3. **Limitations of Existing Methods**: Although several existing methods have proposed solutions, they rely on complex custom architectures and are difficult to apply to other datasets or tasks. To solve these problems, the authors propose an improved adapter design - LQ - Adapter (Learnable Queries Adapter), which enhances the spatial prior module of ViT - Adapter by introducing learnable content queries, thereby improving the quality of location information. LQ - Adapter outperforms existing methods on the GBC detection task. On the GBC detection dataset of US images, the mIoU scores are increased by 5.4%, 5.8% and 2.7% respectively, surpassing ViT - Adapter, DINO and FocalNet - DINO. In addition, LQ - Adapter also demonstrates its effectiveness in polyp detection in colonoscopic images on the Kvasir - Seg dataset, proving its wide applicability in different medical imaging tasks. ### Formula Representation To ensure the correctness and readability of the formulas, some of the key formulas involved in the paper are presented in Markdown format as follows: - **Self - Attention Mechanism**: \[ \text{Attention}(Q, K, V)=\text{Softmax}\left(\frac{QK^{T}}{\sqrt{d_{k}}}\right)V \] where \(Q\), \(K\), \(V\) are query, key and value respectively, which are obtained by linear transformation of feature \(X\), and \(d_{k}\) is the dimension. - **Spatial Prior Injector**: \[ F_{i}^{vit}=F_{i}^{vit}+\gamma_{i}\cdot\text{Attention}(\text{norm}(F_{i}^{vit}), \text{norm}(F_{i}^{sp})) \] - **Cross - Attention in the Extraction Module**: \[ F_{i}^{sp}=\text{Attention}(\text{norm}(F_{i}^{vit}), \text{norm}(F_{i}^{sp})) \] \[ F_{i + 1}^{sp}=F_{i}^{sp}+\text{FFN}(\text{norm}(F_{i}^{sp})) \] \[ LQ_{i}=\text{Attention}(\text{norm}(LQ_{i}), \text{norm}(F_{i}^{vit})) \] \[ LQ_{i + 1}=LQ_{i}+\text{Attention}(\text{norm}(F_{i + 1}^{sp}), \text{norm}(LQ_{i})) \] These formulas show how LQ - Adapter enhances the model's location ability through the cross - attention mechanism and learnable queries, so as to more effectively handle the Gallbladder Cancer detection task in ultrasound images.

LQ-Adapter: ViT-Adapter with Learnable Queries for Gallbladder Cancer Detection from Ultrasound Image

Unsupervised Domain Adaptation with Adversarial Learning for Liver Tumors Detection in Multi-phase CT Images

Surpassing the Human Accuracy: Detecting Gallbladder Cancer from USG Images with Curriculum Learning

RadFormer: Transformers with Global-Local Attention for Interpretable and Accurate Gallbladder Cancer Detection

Enhancing surgical instrument segmentation: integrating vision transformer insights with adapter

Automated Identification of Human Gastrointestinal Tract Abnormalities Based on Deep Convolutional Neural Network with Endoscopic Images

Anatomical sites identification in both ordinary and capsule gastroduodenoscopy via deep learning

Automated gall bladder cancer detection using artificial gorilla troops optimizer with transfer learning on ultrasound images

Advanced CNN models in gastric cancer diagnosis: enhancing endoscopic image analysis with deep transfer learning

VLSM-Adapter: Finetuning Vision-Language Segmentation Efficiently with Lightweight Blocks

Classification of Endoscopy and Video Capsule Images using CNN-Transformer Model

FocusMAE: Gallbladder Cancer Detection from Ultrasound Videos with Focused Masked Autoencoders

BUViTNet: Breast Ultrasound Detection via Vision Transformers

Angiodysplasia Detection and Localization Using Deep Convolutional Neural Networks

Ultrasound SAM Adapter: Adapting SAM for Breast Lesion Segmentation in Ultrasound Images

Gastrointestinal Cancer Detection and Classification Using African Vulture Optimization Algorithm With Transfer Learning

Transfer Learning with Convolutional Neural Network for Early Gastric Cancer Classification on Magnifiying Narrow-Band Imaging Images

SW-UNet: a U-Net fusing sliding window transformer block with CNN for segmentation of lung nodules

Exploring vision transformers for classifying early Barrett's dysplasia in endoscopic images: A pilot study on white-light and narrow-band imaging

Parameter-Efficient Transfer Learning for Medical Visual Question Answering

Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images