LLaVA-Endo: a Large Language-and-vision Assistant for Gastrointestinal Endoscopy

Jieru Yao,Xueran Li,Qiang Xie,Longfei Han,Yiwen Jia,Nian Liu,Dingwen Zhang,Junwei Han
DOI: https://doi.org/10.1007/s11704-024-40319-8
IF: 2.6688
2024-01-01
Frontiers of Computer Science
Abstract:We introduce LLaVA-Endo, a large language and vision model designed for the field of GI endoscopy. Specifically, we generate a high-quality dataset for GI endoscopic medical language-image instruction tuning and introduce an innovative progressive transfer learning technique to fine-tune LLaVA. Experimental results show that LLaVA-Endo demonstrates powerful domain expertise and conversational capabilities, outperforming previous SoTA multimodal methods in the field of GI endoscopy data. In future, we intend to collect more data for training and evaluation, and integrate more functionalities such as report generation, and polyp segmentation.
What problem does this paper attempt to address?