Abstract:Prior highly-tuned image parsing models are usually studied in a certain domain with a specific set of semantic labels and can hardly be adapted into other scenarios (e.g.sharing discrepant label granularity) without extensive re-training. Learning a single universal parsing model by unifying label annotations from different domains or at various levels of granularity is a crucial but rarely addressed topic. This poses many fundamental learning challenges, e.g.discovering underlying semantic structures among different label granularity or mining label correlation across relevant tasks. To address these challenges, we propose a graph reasoning and transfer learning framework, named "Graphonomy," which incorporates human knowledge and label taxonomy into the intermediate graph representation learning beyond local convolutions. In particular, Graphonomy learns the global and structured semantic coherency in multiple domains via semantic-aware graph reasoning and transfer, enforcing the mutual benefits of the parsing across domains (e.g.different datasets or co-related tasks). The Graphonomy includes two iterated modules: Intra-Graph Reasoning and Inter-Graph Transfer modules. The former extracts the semantic graph in each domain to improve the feature representation learning by propagating information with the graph; the latter exploits the dependencies among the graphs from different domains for bidirectional knowledge transfer. We apply Graphonomy to two relevant but different image understanding research topics: human parsing and panoptic segmentation, and show Graphonomy can handle both of them well via a standard pipeline against current state-of-the-art approaches. Moreover, some extra benefit of our framework is demonstrated, e.g., generating the human parsing at various levels of granularity by unifying annotations across different datasets.

Graphonomy: Universal Image Parsing Via Graph Reasoning and Transfer

Learning to transfer focus of graph neural network for scene graph parsing

Learning to Infer Unseen Single-/ Multi-Attribute-Object Compositions with Graph Networks.

When Graph Data Meets Multimodal: A New Paradigm for Graph Understanding and Reasoning

GINet: Graph Interaction Network for Scene Parsing

UniGraph: Learning a Unified Cross-Domain Foundation Model for Text-Attributed Graphs

Semantic Human Parsing via Scalable Semantic Transfer over Multiple Label Domains

AGRNet: Adaptive Graph Representation Learning and Reasoning for Face Parsing.

GPT4SGG: Synthesizing Scene Graphs from Holistic and Region-specific Narratives

LOGICSEG: Parsing Visual Semantics with Neural Logic Learning and Reasoning

Grapy-ML: Graph Pyramid Mutual Learning for Cross-Dataset Human Parsing

Bidirectional Graph Reasoning Network for Panoptic Segmentation

Transformer-induced graph reasoning for multimodal semantic segmentation in remote sensing

Learning to Generate Scene Graph from Natural Language Supervision

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning

Image Parsing: Unifying Segmentation, Detection, and Recognition.

Weakly Supervised Graph Propagation Towards Collective Image Parsing

Scene Graph Parsing as Dependency Parsing

Weakly-Supervised Image Parsing Via Constructing Semantic Graphs and Hypergraphs

All in One and One for All: A Simple yet Effective Method towards Cross-domain Graph Pretraining

GraphText: Graph Reasoning in Text Space