Abstract:Instance segmentation can detect where the objects are in an image, but hard to understand the relationship between them. We pay attention to a typical relationship, relative saliency. A closely related task, salient object detection, predicts a binary map highlighting a visually salient region while hard to distinguish multiple objects. Directly combining two tasks by post-processing also leads to poor performance. There is a lack of research on relative saliency at present, limiting the practical applications such as content-aware image cropping, video summary, and image labeling. In this paper, we study the Salient Object Ranking (SOR) task, which manages to assign a ranking order of each detected object according to its visual saliency. We propose the first end-to-end framework of the SOR task and solve it in a multi-task learning fashion. The framework handles instance segmentation and salient object ranking simultaneously. In this framework, the SOR branch is independent and flexible to cooperate with different detection methods, so that easy to use as a plugin. We also introduce a Position-Preserved Attention (PPA) module tailored for the SOR branch. It consists of the position embedding stage and feature interaction stage. Considering the importance of position in saliency comparison, we preserve absolute coordinates of objects in ROI pooling operation and then fuse positional information with semantic features in the first stage. In the feature interaction stage, we apply the attention mechanism to obtain proposals' contextualized representations to predict their relative ranking orders. Extensive experiments have been conducted on the ASR dataset. Without bells and whistles, our proposed method outperforms the former state-of-the-art method significantly. The code will be released publicly available on https://github.com/EricFH/SOR.

A Software for Rapid Annotation of Scene Objects Based on Saliency Object Ranking

Scene Text Detection and Recognition System for Visually Impaired People in Real World

PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data

FANS: Face Annotation by Searching Large-scale Web Facial Images.(2013). Research Collection School Of Information Systems

Instance-Level Panoramic Audio-Visual Saliency Detection and Ranking

Automatic image annotation based on salient regions

Annotation-free Audio-Visual Segmentation

Smartannotator an Interactive Tool for Annotating Indoor Rgbd Images

Efficient Object Annotation via Speaking and Pointing

OpenAnnotate2: Multi-Modal Auto-Annotating for Autonomous Driving

Crowdsourcing System for Multi-object Annotation in Surveillance Videos

Transcending Pixels: Boosting Saliency Detection via Scene Understanding from Aerial Imagery

Learning to Predict Salient Faces: A Novel Visual-Audio Saliency Model

Object Recognition System for the Visually Impaired: A Deep Learning Approach using Arabic Annotation

Attention-Guided Neural Networks for Full-Reference and No-Reference Audio-Visual Quality Assessment

Ultrasonic Image's Annotation Removal: A Self-supervised Noise2Noise Approach

Automatic Tag Saliency Ranking for Stereo Images

Semi-automatic Dynamic Auxiliary-Tag-aided Image Annotation

Salient Object Ranking with Position-Preserved Attention

A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key!

Label Critic: Design Data Before Models