Abstract:Mobile robotics has achieved notable progress, however, to increase the complexity of the tasks that mobile robots can perform in natural environments, we need to provide them with a greater semantic understanding of their surrounding. In particular, identifying indoor scenes, such as an Office or a Kitchen, is a highly valuable perceptual ability for an indoor mobile robot, and in this paper we propose a new technique to achieve this goal. As a distinguishing feature, we use common objects, such as Doors or furniture, as a key intermediate representation to recognize indoor scenes. We frame our method as a generative probabilistic hierarchical model, where we use object category classifiers to associate low-level visual features to objects, and contextual relations to associate objects to scenes. The inherent semantic interpretation of common objects allows us to use rich sources of online data to populate the probabilistic terms of our model. In contrast to alternative computer vision based methods, we boost performance by exploiting the embedded and dynamic nature of a mobile robot. In particular, we increase detection accuracy and efficiency by using a 3D range sensor that allows us to implement a focus of attention mechanism based on geometric and structural information. Furthermore, we use concepts from information theory to propose an adaptive scheme that limits computational load by selectively guiding the search for informative objects. The operation of this scheme is facilitated by the dynamic nature of a mobile robot that is constantly changing its field of view. We test our approach using real data captured by a mobile robot navigating in Office and home environments. Our results indicate that the proposed approach outperforms several state-of-the-art techniques for scene recognition.

Robotic Indoor Scene Captioning from Streaming Video

Dense captioning and multidimensional evaluations for indoor robotic scenes

Scene Classification in Indoor Environments for Robots using Context Based Word Embeddings

Scenario-Aware Recurrent Transformer for Goal-Directed Video Captioning

Seeing Bot

Indoor Scene Classification Algorithm Based on an Object Vector for Robot Applications.

Streaming Dense Video Captioning

A Mobile Robot Generating Video Summaries of Seniors' Indoor Activities

Indoor scene recognition by a mobile robot through adaptive object detection

Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions

Adversarial Reinforcement Learning With Object-Scene Relational Graph for Video Captioning

Embodied Semantic Scene Graph Generation.

Explore and Tell: Embodied Visual Captioning in 3D Environments

The Robotic Vision Scene Understanding Challenge

Cognition inspired framework for indoor scene annotation

Multi-Source Interactive Stair Attention for Remote Sensing Image Captioning

What can i do around here? Deep functional scene understanding for cognitive robots

Sports Video Captioning via Attentive Motion Representation and Group Relationship Modeling

A Robot Object Recognition Method Based on Scene Text Reading in Home Environments

Captioning Videos Using Large-Scale Image Corpus

Extracting Zero-shot Common Sense from Large Language Models for Robot 3D Scene Understanding