Abstract:The emergence of Segment Anything (SAM) sparked research interest in the field of interactive segmentation, especially in the context of image editing tasks and speeding up data annotation. Unlike common semantic segmentation, interactive segmentation methods allow users to directly influence their output through prompts (e.g. clicks). However, click patterns in real-world interactive segmentation scenarios remain largely unexplored. Most methods rely on the assumption that users would click in the center of the largest erroneous area. Nevertheless, recent studies show that this is not always the case. Thus, methods may have poor performance in real-world deployment despite high metrics in a baseline benchmark. To accurately simulate real-user clicks, we conducted a large crowdsourcing study of click patterns in an interactive segmentation scenario and collected 475K real-user clicks. Drawing on ideas from saliency tasks, we develop a clickability model that enables sampling clicks, which closely resemble actual user inputs. Using our model and dataset, we propose RClicks benchmark for a comprehensive comparison of existing interactive segmentation methods on realistic clicks. Specifically, we evaluate not only the average quality of methods, but also the robustness w.r.t. click patterns. According to our benchmark, in real-world usage interactive segmentation models may perform worse than it has been reported in the baseline benchmark, and most of the methods are not robust. We believe that RClicks is a significant step towards creating interactive segmentation methods that provide the best user experience in real-world cases.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the poor performance of existing interactive segmentation methods in practical use, despite their high performance metrics in benchmark tests. Specifically, current interactive segmentation methods usually assume that the user will click at the center of the maximum error area, but this assumption is not always valid in real - world scenarios. Therefore, these methods may perform less well than expected in actual deployments. To more accurately assess the real - performance of interactive segmentation methods, the authors propose a new benchmarking framework called RClicks. This framework collects click - data from 475,000 real - users through large - scale crowdsourcing research and develops a clickability model for generating simulated clicks that are closer to the clicking behavior of real - users. RClicks evaluates not only the average quality of the methods but also their robustness to different click - patterns. ### Main contributions of the paper: 1. **Large - scale multi - round interaction dataset**: The authors have collected a large - scale dataset containing multiple interaction rounds, covering a variety of image - segmentation tasks. 2. **New click - sampling strategy**: Based on the clickability model, a more realistic click - sampling method than the baseline strategy is proposed. 3. **RClicks benchmark**: The real - world performance of interactive segmentation methods is evaluated using the clickability model, revealing the deficiencies of existing methods in practical use. 4. **First - round real - user click evaluation**: Using the collected first - round real - user click - data, the performance of the segmentation methods is evaluated, and a method for estimating the segmentation difficulty of each instance in the dataset is proposed. ### Core problems of the paper: - **Evaluation bias of interactive segmentation methods**: Existing evaluation methods rely on simple strategies that assume user - clicking behavior, which may lead to over - fitting and performance overestimation. - **Complexity of real - user clicking behavior**: User - clicking behavior is affected by multiple factors, and simple click - strategies cannot fully capture these complexities. ### Solutions: - **Clickability model**: Combining the ideas of visual saliency prediction tasks, a clickability model is proposed to generate simulated clicks that are closer to the real - user clicking behavior. - **RClicks benchmark**: Through large - scale real - user click - data and the clickability model, a more comprehensive and realistic evaluation framework for interactive segmentation methods is provided. Through these improvements, the authors hope to promote the development of interactive segmentation methods so that they can provide a better user experience in practical applications.

RClicks: Realistic Click Simulation for Benchmarking Interactive Segmentation

RClicks: Realistic Click Simulation for Benchmarking Interactive Segmentation

TETRIS: Towards Exploring the Robustness of Interactive Segmentation

FocalClick: Towards Practical Interactive Image Segmentation.

ClickAttention: Click Region Similarity Guided Interactive Segmentation

PseudoClick: Interactive Image Segmentation with Click Imitation

PiClick: Picking the desired mask from multiple candidates in click-based interactive segmentation

Interactive segmentation in aerial images: a new benchmark and an open access web-based tool

Rethinking click embedding for deep interactive image segmentation

Clicks2Line: Using Lines for Interactive Image Segmentation

ScribbleSeg: Scribble-based Interactive Image Segmentation

One-Click-Based Perception for Interactive Image Segmentation

Scale Disparity of Instances in Interactive Point Cloud Segmentation

iSeg: Interactive 3D Segmentation via Interactive Attention

A comparative evaluation of interactive segmentation algorithms

AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation

Benchmarking Human and Automated Prompting in the Segment Anything Model

WeClick: Weakly-Supervised Video Semantic Segmentation with Click Annotations

Efficient Multiple-Click Models in Web Search