Abstract:Detecting human-object interactions (HOIs) is an intricate challenge in the field of computer vision. Existing methods for HOI detection heavily rely on appearance-based features, but these may not fully capture all the essential characteristics necessary for accurate detection. To overcome these challenges, we propose an innovative graph-based approach called TMGHOI (Translational Model for Human-Object Interaction Detection). Our method effectively captures the sentiment representation of HOIs by integrating both spatial and semantic knowledge. By representing HOIs as a graph, where the interaction components serve as nodes and their spatial relationships as edges. To extract crucial spatial and semantic information, TMGHOI employs separate spatial and semantic encoders. Subsequently, these encodings are combined to construct a knowledge graph that effectively captures the sentiment representation of HOIs. Additionally, the ability to incorporate prior knowledge enhances the understanding of interactions, further boosting detection accuracy. We conducted extensive evaluations on the widely-used HICO-DET datasets to demonstrate the effectiveness of TMGHOI. Our approach outperformed existing state-of-the-art graph-based methods by a significant margin, showcasing its potential as a superior solution for HOI detection. We are confident that TMGHOI has the potential to significantly improve the accuracy and efficiency of HOI detection. Its integration of spatial and semantic knowledge, along with its computational efficiency and practicality, makes it a valuable tool for researchers and practitioners in the computer vision community. As with any research, we acknowledge the importance of further exploration and evaluation on various datasets to establish the generalizability and robustness of our proposed method.

Detecting Human-Object Interactions in Videos by Modeling the Trajectory of Objects and Human Skeleton

Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics.

ST-HOI: A Spatial-Temporal Baseline for Human-Object Interaction Detection in Videos

Discovering A Variety of Objects in Spatio-Temporal Human-Object Interactions

Spatial-Temporal Human-Object Interaction Detection.

Hierarchical HOI Detection Framework Augmented by Human Interactive Intention

Skeleton-Based Interactive Graph Network for Human Object Interaction Detection.

TMHOI: Translational Model for Human-Object Interaction Detection

Spatial Parsing and Dynamic Temporal Pooling Networks for Human-Object Interaction Detection

Effective Actor-centric Human-object Interaction Detection

Learning Human-Object Interaction via Interactive Semantic Reasoning

Recognising Human-Object Interaction Via Exemplar Based Modelling

Detecting Zero-Shot Human-Object Interaction with Visual-Text Modeling

Human–object Interaction Recognition Based on Interactivity Detection and Multi-Feature Fusion

Graph-based Method for Human-Object Interactions Detection

Exploring Pose-Aware Human-Object Interaction Via Hybrid Learning

Human-Object Interaction Prediction in Videos through Gaze Following

A Human-Object Interaction Detection Method Inspired by Human Body Part Information

Human-Object Interaction Recognition by Modeling Context

A Review of Human-Object Interaction Detection

Modeling 4D Human-Object Interactions for Joint Event Segmentation, Recognition, and Object Localization.