NanoNet: Real-Time Polyp Segmentation in Video Capsule Endoscopy and Colonoscopy

Debesh Jha,Nikhil Kumar Tomar,Sharib Ali,Michael A. Riegler,Håvard D. Johansen,Dag Johansen,Thomas de Lange,Pål Halvorsen
DOI: https://doi.org/10.48550/arXiv.2104.11138
2021-04-22
Abstract:Deep learning in gastrointestinal endoscopy can assist to improve clinical performance and be helpful to assess lesions more accurately. To this extent, semantic segmentation methods that can perform automated real-time delineation of a region-of-interest, e.g., boundary identification of cancer or precancerous lesions, can benefit both diagnosis and interventions. However, accurate and real-time segmentation of endoscopic images is extremely challenging due to its high operator dependence and high-definition image quality. To utilize automated methods in clinical settings, it is crucial to design lightweight models with low latency such that they can be integrated with low-end endoscope hardware devices. In this work, we propose NanoNet, a novel architecture for the segmentation of video capsule endoscopy and colonoscopy images. Our proposed architecture allows real-time performance and has higher segmentation accuracy compared to other more complex ones. We use video capsule endoscopy and standard colonoscopy datasets with polyps, and a dataset consisting of endoscopy biopsies and surgical instruments, to evaluate the effectiveness of our approach. Our experiments demonstrate the increased performance of our architecture in terms of a trade-off between model complexity, speed, model parameters, and metric performances. Moreover, the resulting model size is relatively tiny, with only nearly 36,000 parameters compared to traditional deep learning approaches having millions of parameters.
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve real - time polyp segmentation in video capsule endoscopy and colonoscopy. Specifically, the paper focuses on developing a lightweight deep - learning architecture, namely NanoNet, for automatically and real - time identifying and segmenting polyp areas in medical images. The technical challenge lies in the need to reduce the complexity and computational cost of the model while ensuring high precision, so that it can run on low - performance medical devices. NanoNet achieves this by using the pre - trained MobileNetV2 as an encoder and combining it with a modified residual block. In addition, the paper is also committed to creating and making public a video capsule endoscopy dataset with 55 polyp annotations to promote the development of related research.