Abstract:According to PBS, nearly one-third of Americans lack access to primary care services, and another forty percent delay going to avoid medical costs. As a result, many diseases are left undiagnosed and untreated, even if the disease shows many physical symptoms on the skin. With the rise of AI, self-diagnosis and improved disease recognition have become more promising than ever; in spite of that, existing methods suffer from a lack of large-scale patient databases and outdated methods of study, resulting in studies being limited to only a few diseases or modalities. This study incorporates readily available and easily accessible patient information via image and text for skin disease classification on a new dataset of 26 skin disease types that includes both skin disease images (37K) and associated patient narratives. Using this dataset, baselines for various image models were established that outperform existing methods. Initially, the Resnet-50 model was only able to achieve an accuracy of 70% but, after various optimization techniques, the accuracy was improved to 80%. In addition, this study proposes a novel fine-tuning strategy for sequence classification Large Language Models (LLMs), Chain of Options, which breaks down a complex reasoning task into intermediate steps at training time instead of inference. With Chain of Options and preliminary disease recommendations from the image model, this method achieves state of the art accuracy 91% in diagnosing patient skin disease given just an image of the afflicted area as well as a patient description of the symptoms (such as itchiness or dizziness). Through this research, an earlier diagnosis of skin diseases can occur, and clinicians can work with deep learning models to give a more accurate diagnosis, improving quality of life and saving lives.

Multi-modal Contrastive-Generative Pre-training for Fine-grained Skin Disease Diagnosis.

CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios

Pre-trained multimodal large language model enhances dermatological diagnosis using SkinGPT-4

Gradient modulated contrastive distillation of low-rank multi-modal knowledge for disease diagnosis

G2D: From Global to Dense Radiography Representation Learning via Vision-Language Pre-training

Generative Text-Guided 3D Vision-Language Pretraining for Unified Medical Image Segmentation

Reply to "Letter to the editor: Angiotensin quantification by mass spectrometry".

Masked Vision and Language Pre-training with Unimodal and Multimodal Contrastive Losses for Medical Visual Question Answering

Multi-task Paired Masking with Alignment Modeling for Medical Vision-Language Pre-training

Can Medical Vision-Language Pre-training Succeed with Purely Synthetic Data?

Improving Medical Vision-Language Contrastive Pretraining with Semantics-aware Triage

Utilizing Synthetic Data for Medical Vision-Language Pre-training: Bypassing the Need for Real Images

Parameter-Efficient Fine-Tuning Medical Multimodal Large Language Models for Medical Visual Grounding

XLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training

MGI: Multimodal Contrastive pre-training of Genomic and Medical Imaging

A Clinical-oriented Multi-level Contrastive Learning Method for Disease Diagnosis in Low-quality Medical Images

A Multimodal Approach to The Detection and Classification of Skin Diseases

MISS: A Generative Pretraining and Finetuning Approach for Med-VQA

MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning

SkinGEN: an Explainable Dermatology Diagnosis-to-Generation Framework with Interactive Vision-Language Models