VideoSAM: A Large Vision Foundation Model for High-Speed Video Segmentation

Chika Maduabuchi,Ericmoore Jossou,Matteo Bucci
2024-10-23
Abstract:High-speed video (HSV) segmentation is essential for analyzing dynamic physical processes in scientific and industrial applications, such as boiling heat transfer. Existing models like U-Net struggle with generalization and accurately segmenting complex bubble formations. We present VideoSAM, a specialized adaptation of the Segment Anything Model (SAM), fine-tuned on a diverse HSV dataset for phase detection. Through diverse experiments, VideoSAM demonstrates superior performance across four fluid environments -- Water, FC-72, Nitrogen, and Argon -- significantly outperforming U-Net in complex segmentation tasks. In addition to introducing VideoSAM, we contribute an open-source HSV segmentation dataset designed for phase detection, enabling future research in this domain. Our findings underscore VideoSAM's potential to set new standards in robust and accurate HSV segmentation. The code and dataset used in this study are available online at <a class="link-external link-https" href="https://github.com/chikap421/videosam" rel="external noopener nofollow">this https URL</a> .
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the challenge of high-frame-rate video (HSV) segmentation in high-dynamic physical processes, such as boiling heat transfer, which are critical in scientific and industrial applications. Existing models like U-Net have limitations in terms of generalization ability and segmentation accuracy when dealing with complex bubble formation. Specifically, the paper aims to: 1. **Improve segmentation accuracy**: Develop a model that can more accurately segment complex bubble structures, especially in high-frame-rate videos under different fluid environments. 2. **Enhance generalization ability**: Build a model that can adapt to various fluid environments and imaging conditions, reducing dependency on specific tasks or data distributions. 3. **Provide an open dataset**: Contribute a high-frame-rate video segmentation dataset specifically designed for phase detection to promote further research in this field. To this end, the paper introduces **VideoSAM**, an improved model based on the Segment Anything Model (SAM). By fine-tuning on a diverse high-frame-rate video dataset, VideoSAM enhances segmentation performance in scientific HSV tasks. Experimental results show that VideoSAM significantly outperforms the traditional U-Net model in four different fluid environments (water, FC-72, nitrogen, and argon), particularly excelling in complex segmentation tasks.