Abstract:The availability of vast amounts of visual data with heterogeneous features is a key factor for developing, testing, and benchmarking of new computer vision (CV) algorithms and architectures. Most visual datasets are created and curated for specific tasks or with limited image data distribution for very specific situations, and there is no unified approach to manage and access them across diverse sources, tasks, and taxonomies. This not only creates unnecessary overheads when building robust visual recognition systems, but also introduces biases into learning systems and limits the capabilities of data-centric AI. To address these problems, we propose the Vision Knowledge Graph (VisionKG), a novel resource that interlinks, organizes and manages visual datasets via knowledge graphs and Semantic Web technologies. It can serve as a unified framework facilitating simple access and querying of state-of-the-art visual datasets, regardless of their heterogeneous formats and taxonomies. One of the key differences between our approach and existing methods is that ours is knowledge-based rather than metadatabased. It enhances the enrichment of the semantics at both image and instance levels and offers various data retrieval and exploratory services via SPARQL. VisionKG currently contains 519 million RDF triples that describe approximately 40 million entities, and are accessible at <a class="link-external link-https" href="https://vision.semkg.org" rel="external noopener nofollow">this https URL</a> and through APIs. With the integration of 30 datasets and four popular CV tasks, we demonstrate its usefulness across various scenarios when working with CV pipelines.

Open-World Visual Recognition Using Knowledge Graphs

Towards Open World Recognition

Learning Visual Models using a Knowledge Graph as a Trainer

General Knowledge Embedded Image Representation Learning

Knowledge-Embedded Mutual Guidance for Visual Reasoning

Iterative Visual Relationship Detection via Commonsense Knowledge Graph

OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments

Multi-Label Zero-Shot Learning with Structured Knowledge Graphs

VisionKG: Unleashing the Power of Visual Datasets via Knowledge Graph

Semantic-visual shared knowledge graph for zero-shot learning

Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition

Sample-Efficient Learning of Novel Visual Concepts

Contrastive Object Detection Using Knowledge Graph Embeddings

From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects

Recognizing Unseen Objects via Multimodal Intensive Knowledge Graph Propagation

Hyperbolic Learning with Synthetic Captions for Open-World Detection

Boosting Deep Open World Recognition by Clustering

Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention

Relation-Aware Reasoning with Graph Convolutional Network.

Visual Representation Learning Guided By Multi-modal Prior Knowledge

ALGO: Object-Grounded Visual Commonsense Reasoning for Open-World Egocentric Action Recognition