Laser Range Finder Microphone Array Occupancy Grids Odometry Sound Direction Global 2 D Occupancy Grid Mapping Global 2 D Self-Localization Hector SLAM Sound Localization Audio Ray Tracing Triangulation Sensory Audio Ray Rejection
Daobilige Su,Keisuke Nakamura,K. Nakadai,J. V. Miró
Abstract:This paper investigates sound source mapping in a real environment using a mobile robot. Our approach is based on audio ray tracing which integrates occupancy grids and sound source localization using a laser range finder and a microphone array. Previous audio ray tracing approaches rely on all observed rays and grids. As such observation errors caused by sound reflection, sound occlusion, wall occlusion, sounds at misdetected grids, etc. can significantly degrade the ability to locate sound sources in a map. A three-layered selective audio ray tracing mechanism is proposed in this work. The first layer conducts frame-based unreliable ray rejection (sensory rejection) considering sound reflection and wall occlusion. The second layer introduces triangulation and audio tracing to detect falsely detected sound sources, rejecting audio rays associated to these misdetected sounds sources (short-term rejection). A third layer is tasked with rejecting rays using the whole history (long-term rejection) to disambiguate sound occlusion. Experimental results under various situations are presented, which proves the effectiveness of our method. I. MOTIVATION AND BACKGROUND The ability of a robot to build a map of its surroundings is a fundamental characteristic required for autonomous navigation in unknown spaces. Most Simultaneous Localization And Mapping (SLAM) systems which are implemented for indoor environments are vision or LIDAR based [1]. Despite substantial developments with these sensing modalities, audiobased mapping is still in its primitive phase, and remains an open subject of research given the particularly challenging conditions associated to environmental acoustic noise and reflections. Because of its importance, e.g. for Human-Robot Interaction, sound source mapping has recently become a main challenge in the field of robot audition, and several methods have been reported.The existing methods can be mainly categorized into two approaches. The first approach combines Sound Source Localization (SSL) and robot’s odometry within localization strategies such as triangulation [2], particle filters [3], FastSLAM [4], Evidence Grids [5] or PSFS [6]. The approach is relatively easy to implement as relies only on a microphone array mounted on the robot. Its performance is relatively unaffected by external factors such as room shape etc. although it is contrained by two critical factors: 1) The robot needs to see all sound sources from different angles so as to locate the sound sources precisely. 1Daobilige Su and Jaime Valls Miro are with Centre for Autonomous System (CAS), University of Technology, Sydney (UTS), Australia, 2Keisuke Nakamura and Kazuhiro Nakadai are with Honda Research Institute Japan Co., Ltd., Wako, Saitama, 351-0114, Japan {keisuke,nakadai} 2) Sound reflection is not taken in consideration, resulting in performance degradation in reverberant environments. Issue 1 is particularly apparent when the robot drives directly towards a sound source; the sound source is observed from only one angle and the methods will eventually fail. This situation is more likely to happen when the robot is driving in a narrow corridor and there is one sound source at the end of the corridor. Issue 2 becomes especially critical when applying the methods in indoor environments. The second approach relies on occupancy grids in addition to SSL and odometry to develop a ray tracing approach to detect sound sources [7]. Thanks to the fusion of SSL with the distance scan data, locations of sound sources can be obtained by one single position of the robot. Therefore, sensing sound sources from different angles is no longer needed, which solves the issue 1). This approach has mainly the following assumptions: 3) An audio ray hit a sufficiently narrow area of occupied grids so that the sound location is uniquely determined. 4) All sound sources are on occupied grids. 5) Sounds do not pass through occupied grids. However, these assumptions are not always satisfied in real-world applications. For instance, the assumption 3) is problematic when there is wall occlusion. Especially when localizing a single isolated sound source, the wall behind the true sound source will get higher probability and be mistakenly detected as a sound source [7]. The assumption 4) is not met especially when using a planar laser range finder. If there is any sound source that cannot be scanned by the laser, this method will trace the location of the sound source to the obstacle behind it and fail. The assumption 5) is also not satisfied if there are acoustically transparent materials or low walls which accept diffraction, which induces the wall in front of the true sound source will mistakenly get higher probability. In this paper a mechanism for sound source mapping suitable for real environments able to tackle the above issues is investigated. Following on the ground work of our previous approach [8], we use audio rays combined with occupancy grids to solve issue 1). To solve other issues, this paper proposes a three-layered selective audio ray tracing inspired by the multi-store model. The first layer conducts frame-based unreliable ray rejection (sensory rejection) considering sound reflection and wall occlusion to solve issues 2) and 3). To solve issue 4), the second layer introduces triangulation [2] using all observed audio rays to detect sounds at misdetected Laser Range Finder Microphone Array Occupancy Grids Odometry Sound Direction Global 2D Occupancy Grid Mapping Global 2D Self-Localization Hector SLAM