Sound source localization for auditory perception of a humanoid robot using deep neural networks

G. Boztas
DOI: https://doi.org/10.1007/s00521-022-08047-x
2022-11-29
Neural Computing and Applications
Abstract:This paper presents an estimation of the sound source location using deep neural networks in order to provide auditory perception of a humanoid robot. Estimation of a moving sound source is crucial for a humanoid robot to improve functionality in some environments where the robot’s camera cannot operate. It plays an important role, especially in a recovery scenario with no visual contact. In this study, the data of the sound source around the robot were recorded by four microphones placed on the humanoid robot’s head. A wheeled robot was used to obtain the sound source with odometry. Recorded sound dataset and collected odometry dataset were used as input data and target data, respectively. The discrete wavelet transform (DWT) was applied for pre-processing of the input data. After pre-processing, the obtained matrices were applied as inputs of the proposed convolutional neural network (CNN), long short-term memory (LSTM), bidirectional long-short-term memory (biLSTM), and multilayer perceptron (MLP) networks to estimate the sound source location around the humanoid robot. As a result of all tests for the estimation models created by proposed networks, the R2documentclass[12pt]{minimal}usepackage{amsmath}usepackage{wasysym}usepackage{amsfonts}usepackage{amssymb}usepackage{amsbsy}usepackage{mathrsfs}usepackage{upgreek}setlength{oddsidemargin}{-69pt}egin{document}$$R^2$$end{document} metrics of the biLSTM structure were obtained as approximately 0.97. This study showed experimentally that humanoid robots can sense the position of sound source in the environment with sufficient accuracy like many living creatures.
computer science, artificial intelligence
What problem does this paper attempt to address?