Audio-Visual Bimodal Combination-Based Speaker Tracking Method for Mobile Robot

Hao-Yan Zhang,Long-Bo Zhang,Qi-Feng Shi,Zhen-Tao Liu
DOI: https://doi.org/10.20965/jaciii.2024.p0196
2024-01-01
Journal of Advanced Computational Intelligence and Intelligent Informatics
Abstract:Initiative service is a key research direction for the new generation of service robots. It is important to automatically track humans for initiative service in human-robot interaction. To solve the problems of low precision and poor anti-interference capability of only using single-modal (audio or visual) information, a speaker positioning and tracking method based on an audio-visual bimodal combination is proposed. First, the azimuth of the speaker is obtained based on the time difference of arrival using a microphone array, and face detection based on AdaBoost is carried out using the camera. A distance and azimuth calculation model is established to obtain the position of the speaker. Second, a speaker positioning strategy based on an audio-visual bimodal combination is designed to handle different situations. Third, the path is planned by which the azimuth and distance between the robot and the speaker are maintained in a limited range. Different azimuths and distances for speaker tracking are set to perform various simulations. Finally, the mobile robot is driven to follow the path using the STM32 real-time control system. Information from the microphone array and the camera is collected and processed by Raspberry Pi. The tracking accuracy was tested under a single-face situation by setting 20 different target points, and 10 tests were carried out under each point. Under multi-face situations, the audio-visual bimodal information is combined to identify the speaker, and then the Kalman filter is used in face tracking. The experimental results demonstrate that the running trajectory of the mobile robot is close to the ideal trajectory, which ensures effective speaker tracking.
What problem does this paper attempt to address?