GP-VLS: A general-purpose vision language model for surgery

Samuel Schmidgall,Joseph Cho,Cyril Zakka,William Hiesinger
2024-08-07
Abstract:Surgery requires comprehensive medical knowledge, visual assessment skills, and procedural expertise. While recent surgical AI models have focused on solving task-specific problems, there is a need for general-purpose systems that can understand surgical scenes and interact through natural language. This paper introduces GP-VLS, a general-purpose vision language model for surgery that integrates medical and surgical knowledge with visual scene understanding. For comprehensively evaluating general-purpose surgical models, we propose SurgiQual, which evaluates across medical and surgical knowledge benchmarks as well as surgical vision-language questions. To train GP-VLS, we develop six new datasets spanning medical knowledge, surgical textbooks, and vision-language pairs for tasks like phase recognition and tool identification. We show that GP-VLS significantly outperforms existing open- and closed-source models on surgical vision-language tasks, with 8-21% improvements in accuracy across SurgiQual benchmarks. GP-VLS also demonstrates strong performance on medical and surgical knowledge tests compared to open-source alternatives. Overall, GP-VLS provides an open-source foundation for developing AI assistants to support surgeons across a wide range of tasks and scenarios. The code and data for this work is publicly available at <a class="link-external link-http" href="http://gpvls-surgery-vlm.github.io" rel="external noopener nofollow">this http URL</a>.
Computer Vision and Pattern Recognition,Machine Learning,Tissues and Organs
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to develop a general - purpose visual - language model (GP - VLS) that can understand and process surgical scenes and interact with clinicians through natural language. Specifically, the paper aims to: 1. **Create a general - purpose surgical visual - language model**: Most existing surgical AI models focus on solving problems for specific tasks and lack a system that can understand and operate in surgical scenes in a broader range of tasks and scenarios. GP - VLS aims to fill this gap and provide a general - purpose platform that can understand medical and surgical knowledge and combine visual - scene understanding. 2. **Evaluate the quality of the general - purpose surgical model**: In order to comprehensively evaluate the effectiveness of such general - purpose surgical models, the author proposes a new evaluation metric - SurgiQual. This metric not only covers benchmark tests of medical and surgical knowledge but also includes the evaluation of surgical visual - language problems. 3. **Develop new training datasets**: In order to train GP - VLS, the author has developed six new datasets. These datasets cover medical knowledge, surgical textbooks, and visual - language equivalent tasks such as phase identification and tool identification. These datasets provide rich training materials for the model, enabling it to better understand and process complex surgical scenes. ### Main contributions 1. **Open - source general - purpose surgical visual - language model (GP - VLS)**: This model can not only understand the basic concepts of medicine and surgery but also handle complex visual - language problems. 2. **Comprehensive evaluation metric (SurgiQual)**: Used to evaluate the ability of surgical visual - language models in medical and surgical knowledge and visual - scene understanding. 3. **Six new surgical training datasets**: Including five visual - language datasets and one dataset from surgical textbooks, covering a wide range of surgical tasks. ### Solutions By integrating medical and surgical knowledge with visual - scene understanding, GP - VLS can support surgeons' work in multiple aspects, from preoperative planning to intraoperative guidance to postoperative care. In addition, the model also has the ability to explain its reasoning process, which is crucial for ensuring that technology enhances rather than replaces human expertise. ### Summary GP - VLS represents an important progress in the development of general - purpose surgical AI assistants. By combining medical knowledge with specialized surgical understanding and visual understanding, it lays the foundation for language - based surgical AI systems. Although challenges still exist, the potential benefits of this model in surgical practice are huge.