Shifting Focus: From Global Semantics to Local Prominent Features in Swin-Transformer for Knee Osteoarthritis Severity Assessment

Aymen Sekhri,Marouane Tliba,Mohamed Amine Kerkouri,Yassine Nasser,Aladine Chetouani,Alessandro Bruno,Rachid Jennane
2024-03-15
Abstract:Conventional imaging diagnostics frequently encounter bottlenecks due to manual inspection, which can lead to delays and inconsistencies. Although deep learning offers a pathway to automation and enhanced accuracy, foundational models in computer vision often emphasize global context at the expense of local details, which are vital for medical imaging diagnostics. To address this, we harness the Swin Transformer's capacity to discern extended spatial dependencies within images through the hierarchical framework. Our novel contribution lies in refining local feature representations, orienting them specifically toward the final distribution of the classifier. This method ensures that local features are not only preserved but are also enriched with task-specific information, enhancing their relevance and detail at every hierarchical level. By implementing this strategy, our model demonstrates significant robustness and precision, as evidenced by extensive validation of two established benchmarks for Knee OsteoArthritis (KOA) grade classification. These results highlight our approach's effectiveness and its promising implications for the future of medical imaging diagnostics. Our implementation is available on
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the issue of severity assessment in knee osteoarthritis (KOA), which traditionally relies on manual inspection and is prone to delays and inconsistencies. Although deep learning can automate the diagnosis and improve accuracy, it often overlooks local details. To tackle this, the paper proposes leveraging Swin Transformer to capture long-range spatial dependencies in the images and optimizing local feature representation through Negative Cosine Similarity Loss (NCSL), ensuring enhanced details and relevance of local features for diagnosis while maintaining global context. This approach enhances the accuracy and reliability of medical image diagnosis for KOA classification, as demonstrated by experimental results.