Abstract:Category-Agnostic Pose Estimation (CAPE) localizes keypoints across diverse object categories with a single model, using one or a few annotated support images. Recent works have shown that using a pose graph (i.e., treating keypoints as nodes in a graph rather than isolated points) helps handle occlusions and break symmetry. However, these methods assume a static pose graph with equal-weight edges, leading to suboptimal results. We introduce EdgeCape, a novel framework that overcomes these limitations by predicting the graph's edge weights which optimizes localization. To further leverage structural priors, we propose integrating Markovian Structural Bias, which modulates the self-attention interaction between nodes based on the number of hops between them. We show that this improves the model's ability to capture global spatial dependencies. Evaluated on the MP-100 benchmark, which includes 100 categories and over 20K images, EdgeCape achieves state-of-the-art results in the 1-shot setting and leads among similar-sized methods in the 5-shot setting, significantly improving keypoint localization accuracy. Our code is publicly available.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the key - point localization problem in **Category - Agnostic Pose Estimation (CAPE)**. Specifically, the paper proposes a new framework **EdgeCape** to improve the limitations of existing methods in dealing with occlusion, symmetry and complex object structures. #### Main problems: 1. **Static and equally - weighted graph structure**: - Existing methods usually assume a static pose graph, in which all edges have the same weight. This assumption causes the model to be unable to effectively handle complex object structures and occlusion problems, thus affecting the localization accuracy of key points. 2. **Lack of adaptive graph structure**: - Traditional CAPE methods use a fixed binary skeleton definition and cannot dynamically adjust the graph structure according to different input instances. This limits the generalization ability of the model when facing new categories or complex geometric shapes. 3. **Insufficient capture of global spatial dependence**: - Existing methods are difficult to capture the global spatial dependence between key points, especially when dealing with multi - hop relationships. #### Solutions: - **Introduce an adaptive edge - weight prediction mechanism**: By predicting the edge weights of the graph, the model can learn complex ground - truth instance - specific pose graphs, rather than relying solely on the fixed binary skeleton definition. - **Enhanced graph - based architecture**: Incorporate Markovian Structural Bias to enable the model to better capture the complex spatial dependence between key points. - **Optimize key - point localization accuracy**: Verified by experiments on the MP - 100 benchmark dataset, EdgeCape has achieved state - of - the - art results in both 1 - shot and 5 - shot settings, significantly improving the accuracy of key - point localization. ### Summary: By introducing adaptive edge - weight prediction and Markovian Structural Bias, the paper solves the limitations of existing CAPE methods in dealing with complex object structures and occlusion, thereby improving the accuracy of key - point localization and the generalization ability of the model.

Edge Weight Prediction For Category-Agnostic Pose Estimation

A Graph-Based Approach for Category-Agnostic Pose Estimation

CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation

SCAPE: A Simple and Strong Category-Agnostic Pose Estimator

Meta-Point Learning and Refining for Category-Agnostic Pose Estimation

Context-Guided Adaptive Network for Efficient Human Pose Estimation.

Pose for Everything: Towards Category-Agnostic Pose Estimation

CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models

GPT-COPE: A Graph-Guided Point Transformer for Category-Level Object Pose Estimation

Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation

Towards Real-World Category-level Articulation Pose Estimation

Category-Level Object Pose Estimation with Statistic Attention

EANet: Towards Lightweight Human Pose Estimation With Effective Aggregation Network

Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation

SD-Pose: facilitating space-decoupled human pose estimation via adaptive pose perception guidance

Probabilistic Category-Level Pose Estimation via Segmentation and Predicted-Shape Priors

Attention Guided 6D Object Pose Estimation with Multi-constraints Voting Network

SAR-Net: Shape Alignment and Recovery Network for Category-level 6D Object Pose and Size Estimation

Leveraging Positional Encoding for Robust Multi-Reference-Based Object 6D Pose Estimation

Overcoming Data Deficiency for Multi-Person Pose Estimation

Multi-Scale Structure-Aware Network for Human Pose Estimation