A Vision Aid for the Visually Impaired using Commodity Dual-Rear-Camera Smartphones
Minh Nguyen,Huy Le,Wei Qi Yan,Arpita Dawda
DOI: https://doi.org/10.1109/M2VIP.2018.8600857
2018-01-01
Abstract:Dual- (or multiple) rear cameras on hand-held smartphones are believed to be the future of mobile photography. Recently, many of such new has been released (mainly with dual-rear cameras: one wide-angle and one telephoto). Some of the notable ones are Apple iPhone 7 and 8 Plus, iPhone X, Samsung Galaxy S9, LG V30, Huawei Mate 10. With built-in dual-camera systems, these devices are capable of not only producing better quality picture but also acquiring 3D stereo photos (with depth information collected). Thus, they are capable of capturing the moment in life with depth just like our two eye system. Thanks to this current trend, these phones are now getting cheaper while becoming more power complete. In this paper, we describe a system that makes use of the commercial dual rear-camera phones such as the iPhone X, to provide aids for people who are visually impaired. We propose a design to place the phone on the chest centre of the user who has one or two Bluetooth headphone(s) plugged into the ears to listen to the phone audio outputs. Our system is consist of three modules: (1) the scene context recognition to audio, (2) the 3D stereo reconstruction to audio, and (3) the interactive audio/voice controls. In slightly more detail, the wide-angle camera captures live photos to be investigated by a GPS guided Deep Learning process to describe the scene in front of him/herself (module 1). The telephoto camera captures the more narrow-angle and thus to be stereo reconstructed with the aids of the wide angle's one to form a depth map (densed area-based distance map). The map helps determine the distance to all visible object(s) to notify the user with critical ones (module 2). This module also makes the phone vibrate when an object(s) located close enough to the user, e.g. within hand reach distance. The user can also query the system by asking various questions to get automatic voice answering (module 3). In addition, a manual rescue module (module 4) is also added when other things have gone wrong. An example of the vision to audio could be ”Overall, likely a corridor, one medium object is 0.5 m away - central left”, or ”Overall, city pathway, front cleared”. Audio command input may be ”read texts”, and the phone will detect and read all texts on closest object. More details on the design and implementation are further described in this paper.