Abstract:We present MINOS, a simulator designed to support the development of multisensory models for goal-directed navigation in complex indoor environments. The simulator leverages large datasets of complex 3D environments and supports flexible configuration of multimodal sensor suites. We use MINOS to benchmark deep-learning-based navigation methods, to analyze the influence of environmental complexity on navigation performance, and to carry out a controlled study of multimodality in sensorimotor learning. The experiments show that current deep reinforcement learning approaches fail in large realistic environments. The experiments also indicate that multimodality is beneficial in learning to navigate cluttered scenes. MINOS is released open-source to the research community at <a class="link-external link-http" href="http://minosworld.org" rel="external noopener nofollow">this http URL</a> . A video that shows MINOS can be found at <a class="link-external link-https" href="https://youtu.be/c0mL9K64q84" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to develop and evaluate multi - sensory models capable of goal - oriented navigation in complex indoor environments. Specifically, the authors introduce MINOS (Multimodal Indoor Simulator), a simulator designed specifically for navigation in complex indoor environments. MINOS aims to support research in the following aspects: 1. **Multimodal perception**: Support research on multimodal perception through flexible configuration of multiple sensors (such as vision, depth, surface normal, tactile, and semantic segmentation). 2. **The impact of environmental complexity**: Analyze the impact of environmental complexity on navigation performance. 3. **The role of multimodality in sensorimotor learning**: Conduct controlled experiments to study the advantages of multimodality in sensorimotor learning. ### Main problems - **Limitations of existing methods**: Current navigation methods based on deep reinforcement learning perform poorly in large - scale real - world environments. For example, in medium - sized Matterport3D scenes, the most successful method has a success rate of no more than 20% in the PointGoal task, and an even lower success rate of only 14% in the RoomGoal task. - **Advantages of multimodal perception**: Experiments show that depth and tactile information are particularly useful for learning navigation, and are even more effective than visual information in some cases. Combining multiple sensory information can significantly improve navigation performance in cluttered environments. ### Solutions MINOS addresses these problems in the following ways: - **Large - scale real - world environments**: Utilize more than 45,000 three - dimensional house models provided by the SUNCG and Matterport3D datasets to provide rich training and testing environments. - **Flexible multimodal sensor configuration**: Support users to customize the number, location, and parameters of sensors. - **Efficient simulation framework**: Achieve a rendering speed of hundreds of frames per second and support efficient training of millions of simulation steps. - **Benchmark tasks**: Set up three goal - oriented navigation tasks (PointGoal, ObjectGoal, and RoomGoal) to evaluate the performance of different algorithms. ### Experimental results Through these experiments, the authors demonstrate the challenges of existing deep reinforcement learning methods in complex real - world environments and prove the importance of multimodal perception in navigation. As an open - source platform, MINOS provides strong support for future research. ### Formula representation Some formulas involved in the paper can be represented in Markdown format as follows: - **Value function**: \[ V(s_t)=\mathbb{E}\left[\sum_{k = 0}^{T - t}\gamma^k r_{t + k+ 1}\mid s_t\right] \] - **Policy gradient update**: \[ \theta_{t+1}=\theta_t+\alpha\nabla_\theta J(\theta) \] where \(J(\theta)\) is the objective function and \(\alpha\) is the learning rate. Through these improvements and experiments, MINOS provides new tools and insights for navigation research in complex indoor environments.

MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments

A Navigation Cognitive System Driven by Hierarchical Spiking Neural Network.

Learning to Navigate in Complex Environments

Image-based Navigation in Real-World Environments via Multiple Mid-level Representations: Fusion Models, Benchmark and Efficient Evaluation

Learning Cognitive Map Representations for Navigation by Sensory–Motor Integration

Multi goals and multi scenes visual mapless navigation in indoor using meta-learning and scene priors

MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation

IndoorSim-to-OutdoorReal: Learning to Navigate Outdoors without any Outdoor Experience

Navigation Agents for the Visually Impaired: A Sidewalk Simulator and Experiments

Multimodal sensory integration and concurrent navigation strategies for spatial cognition in real and artificial organisms

A Study on Learning Social Robot Navigation with Multimodal Perception

Multimodal Large Language Model for Visual Navigation

Real-Time Metric-Semantic Mapping for Autonomous Navigation in Outdoor Environments

SOCIALGYM 2.0: Simulator for Multi-Agent Social Robot Navigation in Shared Human Spaces

Leveraging Large Language Model-based Room-Object Relationships Knowledge for Enhancing Multimodal-Input Object Goal Navigation

MonoNav: MAV Navigation via Monocular Depth Estimation and Reconstruction

Brain-Inspired Multimodal Navigation With Multiscale Hippocampal–Entorhinal Neural Network

Multi-Object Navigation with dynamically learned neural implicit representations

Autonomous Navigation in Complex Environments with Deep Multimodal Fusion Network

Out of the Box: Embodied Navigation in the Real World

MOSAIC: Learning Unified Multi-Sensory Object Property Representations for Robot Learning via Interactive Perception