Abstract:Dual- (or multiple) rear cameras on hand-held smartphones are believed to be the future of mobile photography. Recently, many of such new has been released (mainly with dual-rear cameras: one wide-angle and one telephoto). Some of the notable ones are Apple iPhone 7 and 8 Plus, iPhone X, Samsung Galaxy S9, LG V30, Huawei Mate 10. With built-in dual-camera systems, these devices are capable of not only producing better quality picture but also acquiring 3D stereo photos (with depth information collected). Thus, they are capable of capturing the moment in life with depth just like our two eye system. Thanks to this current trend, these phones are now getting cheaper while becoming more power complete. In this paper, we describe a system that makes use of the commercial dual rear-camera phones such as the iPhone X, to provide aids for people who are visually impaired. We propose a design to place the phone on the chest centre of the user who has one or two Bluetooth headphone(s) plugged into the ears to listen to the phone audio outputs. Our system is consist of three modules: (1) the scene context recognition to audio, (2) the 3D stereo reconstruction to audio, and (3) the interactive audio/voice controls. In slightly more detail, the wide-angle camera captures live photos to be investigated by a GPS guided Deep Learning process to describe the scene in front of him/herself (module 1). The telephoto camera captures the more narrow-angle and thus to be stereo reconstructed with the aids of the wide angle's one to form a depth map (densed area-based distance map). The map helps determine the distance to all visible object(s) to notify the user with critical ones (module 2). This module also makes the phone vibrate when an object(s) located close enough to the user, e.g. within hand reach distance. The user can also query the system by asking various questions to get automatic voice answering (module 3). In addition, a manual rescue module (module 4) is also added when other things have gone wrong. An example of the vision to audio could be ”Overall, likely a corridor, one medium object is 0.5 m away - central left”, or ”Overall, city pathway, front cleared”. Audio command input may be ”read texts”, and the phone will detect and read all texts on closest object. More details on the design and implementation are further described in this paper.

SmartDepthSync: Open Source Synchronized Video Recording System of Smartphone RGB and Depth Camera Range Image Frames with Sub-millisecond Precision

DoCam: Depth Sensing with an Optical Image Stabilization Supported RGB Camera.

MobiDepth: Real-Time Depth Estimation Using On-Device Dual Cameras.

DELTAR: Depth Estimation from a Light-Weight ToF Sensor and RGB Image

FloatingFusion: Depth from ToF and Image-stabilized Stereo Cameras

The Implicit Values of A Good Hand Shake: Handheld Multi-Frame Neural Depth Refinement

AutoDepthNet: High Frame Rate Depth Map Reconstruction using Commodity Depth and RGB Cameras

Mobile3DScanner: an Online 3D Scanner for High-quality Object Reconstruction with a Mobile Device

Synthetic depth-of-field with a single-camera mobile phone

Relative Depth Estimation With An Uncalibrated Camera For Image Refocus

A Vision Aid for the Visually Impaired using Commodity Dual-Rear-Camera Smartphones

Multi-view data capture for dynamic object reconstruction using handheld augmented reality mobiles

Mobile3DRecon: Real-time Monocular 3D Reconstruction on a Mobile Phone

Handheld Multi-Frame Super-Resolution

HiMoDepth: Efficient Training-Free High-Resolution On-Device Depth Perception

Depth on Demand: Streaming Dense Depth from a Low Frame Rate Active Sensor

A Dual Camera System for High Spatiotemporal Resolution Video Acquisition

Fast and Accurate Single-Image Depth Estimation on Mobile Devices, Mobile AI 2021 Challenge: Report.

SmartPortraits: Depth Powered Handheld Smartphone Dataset of Human Portraits for State Estimation, Reconstruction and Synthesis

Real-time single image depth perception in the wild with handheld devices

Fast visual odometry and mapping from RGB-D data