Abstract:We propose Audio Noise Awareness using Visuals of Indoors for NAVIgation for quieter robot path planning. While humans are naturally aware of the noise they make and its impact on those around them, robots currently lack this awareness. A key challenge in achieving audio awareness for robots is estimating how loud will the robot's actions be at a listener's location? Since sound depends upon the geometry and material composition of rooms, we train the robot to passively perceive loudness using visual observations of indoor environments. To this end, we generate data on how loud an 'impulse' sounds at different listener locations in simulated homes, and train our Acoustic Noise Predictor (ANP). Next, we collect acoustic profiles corresponding to different actions for navigation. Unifying ANP with action acoustics, we demonstrate experiments with wheeled (Hello Robot Stretch) and legged (Unitree Go2) robots so that these robots adhere to the noise constraints of the environment. See code and data at <a class="link-external link-https" href="https://anavi-corl24.github.io/" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: How can a robot be aware of the impact of the noise generated by its own actions on the surrounding people when planning a path in an indoor environment, and adjust its behavior accordingly to reduce noise interference. ### Problem Background At present, robots may generate relatively high noise (for example, 70 decibels) when moving or performing tasks in an indoor environment, which causes trouble for some people who are sensitive to sound (such as people in a meeting or children who have just fallen asleep). Existing solutions usually completely stop the robot's work through the "Do Not Disturb" mode, but this is obviously not the optimal solution because it sacrifices the robot's functionality. Therefore, researchers hope to develop a method that enables the robot to adjust its behavior according to the noise constraints in the environment without affecting efficiency. ### Core Objectives of the Paper The paper proposes a framework named ANA VI (Audio Noise Awareness using Visuals of Indoors for Navigation), aiming to enable the robot to predict the noise level generated by its actions at a specific location and adjust its path planning according to these predictions, thereby achieving quieter navigation. ### Key Challenges 1. **Volume Prediction**: The robot needs to predict how loud its actions will sound at the listener's position. This depends not only on the distance, but also on the geometry and material properties of the room. 2. **Visual Perception**: The robot needs to perceive the geometric structure and material properties in the environment through visual observation, so as to estimate the propagation and attenuation of sound more accurately. 3. **Path Planning**: Based on the predicted noise level, the robot needs to find a path that can complete the task and minimize the impact of noise as much as possible. ### Solutions To achieve this goal, the author proposes the following methods: - **Acoustic Noise Prediction (ANP) Model**: This model combines visual features, the relative distance and direction between the listener and the robot to predict the maximum decibel value of the robot's actions at the listener's position. Specifically, the model uses a pre - trained image encoder (such as ResNet - 18) to extract visual features, and uses a direction - distance encoder and a prediction module to perform the final volume prediction. - **Simulation Experiments**: The author conducted a large number of experiments in the Habitat 2.0 simulator using the Matterport3D dataset to verify the effectiveness of the ANP model. - **Real - World Experiments**: The author also conducted tests in the real environment, recorded the action sounds of the robot at different positions, and compared them with the results predicted by the model. ### Main Contributions 1. **Performance Improvement**: Compared with the distance - based heuristic method and other baseline models, the ANP model performs better in predicting volume, especially in complex environments. 2. **Real - World Applicability**: The author shows the application potential of the ANA VI framework in the real world, proving that it can help robots plan quieter paths. 3. **Multi - Modal Learning**: By combining visual and auditory information, the model can better understand the acoustic properties in the environment and make more reasonable decisions. In conclusion, this paper successfully solves the problem of noise perception and path planning of robots in indoor environments by introducing visual information and deep - learning technology, laying the foundation for the wide application of future intelligent robots in human living spaces.

ANAVI: Audio Noise Awareness using Visuals of Indoor environments for NAVIgation

Sound Adversarial Audio-Visual Navigation

CAVEN: An Embodied Conversational Agent for Efficient Audio-Visual Navigation in Noisy Environments

AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments

Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual Navigation

Audio Visual Language Maps for Robot Navigation

PANav: Toward Privacy-Aware Robot Navigation via Vision-Language Models

ActiveRIR: Active Audio-Visual Exploration for Acoustic Environment Modeling

Vision-Guided Robot Hearing

AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans

HE-Nav: A High-Performance and Efficient Navigation System for Aerial-Ground Robots in Cluttered Environments

SACSoN: Scalable Autonomous Control for Social Navigation

ProNav: Proprioceptive Traversability Estimation for Legged Robot Navigation in Outdoor Environments

Learning Semantic-Agnostic and Spatial-Aware Representation for Generalizable Visual-Audio Navigation

Sound Matters: Auditory Detectability of Mobile Robots

Revolutionizing Blind Navigation through AI Voices

APEX: affordance-based plan executor for indoor robotic navigation

VIRUS-NeRF -- Vision, InfraRed and UltraSonic based Neural Radiance Fields

An Efficient Locally Reactive Controller for Safe Navigation in Visual Teach and Repeat Missions