ViT Cane: Visual Assistant for the Visually Impaired

Bhavesh Kumar
DOI: https://doi.org/10.48550/arXiv.2109.13857
2021-09-26
Abstract:Blind and visually challenged face multiple issues with navigating the world independently. Some of these challenges include finding the shortest path to a destination and detecting obstacles from a distance. To tackle this issue, this paper proposes ViT Cane, which leverages a vision transformer model in order to detect obstacles in real-time. Our entire system consists of a Pi Camera Module v2, Raspberry Pi 4B with 8GB Ram and 4 motors. Based on tactile input using the 4 motors, the obstacle detection model is highly efficient in helping visually impaired navigate unknown terrain and is designed to be easily reproduced. The paper discusses the utility of a Visual Transformer model in comparison to other CNN based models for this specific application. Through rigorous testing, the proposed obstacle detection model has achieved higher performance on the Common Object in Context (COCO) data set than its CNN counterpart. Comprehensive field tests were conducted to verify the effectiveness of our system for holistic indoor understanding and obstacle avoidance.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
This paper aims to solve various challenges faced by visually impaired people when navigating the world independently, especially the problems of finding the shortest path and detecting obstacles at a long distance. To address these challenges, the paper proposes the ViT Cane system, which uses the Vision Transformer model to achieve real - time obstacle detection. The entire system consists of a Pi Camera Module v2, a Raspberry Pi 4B with 8GB of memory, and four motors. Through tactile input (using four motors), this obstacle detection model can effectively assist visually impaired people in navigating unknown terrains and is designed to be easily replicated. The paper also discusses the advantages of the Vision Transformer model over other CNN - based models in this specific application. Through rigorous testing, the proposed obstacle detection model outperforms its CNN counterparts on the Common Object in Context (COCO) dataset. In addition, comprehensive field tests were carried out to verify the overall indoor understanding and obstacle - avoidance effectiveness of the system.