RClicks: Realistic Click Simulation for Benchmarking Interactive Segmentation

Anton Antonov,Andrey Moskalenko,Denis Shepelev,Alexander Krapukhin,Konstantin Soshin,Anton Konushin,Vlad Shakhuro
2024-10-24
Abstract:The emergence of Segment Anything (SAM) sparked research interest in the field of interactive segmentation, especially in the context of image editing tasks and speeding up data annotation. Unlike common semantic segmentation, interactive segmentation methods allow users to directly influence their output through prompts (e.g. clicks). However, click patterns in real-world interactive segmentation scenarios remain largely unexplored. Most methods rely on the assumption that users would click in the center of the largest erroneous area. Nevertheless, recent studies show that this is not always the case. Thus, methods may have poor performance in real-world deployment despite high metrics in a baseline benchmark. To accurately simulate real-user clicks, we conducted a large crowdsourcing study of click patterns in an interactive segmentation scenario and collected 475K real-user clicks. Drawing on ideas from saliency tasks, we develop a clickability model that enables sampling clicks, which closely resemble actual user inputs. Using our model and dataset, we propose RClicks benchmark for a comprehensive comparison of existing interactive segmentation methods on realistic clicks. Specifically, we evaluate not only the average quality of methods, but also the robustness w.r.t. click patterns. According to our benchmark, in real-world usage interactive segmentation models may perform worse than it has been reported in the baseline benchmark, and most of the methods are not robust. We believe that RClicks is a significant step towards creating interactive segmentation methods that provide the best user experience in real-world cases.
Computer Vision and Pattern Recognition,Artificial Intelligence,Human-Computer Interaction
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the poor performance of existing interactive segmentation methods in practical use, despite their high performance metrics in benchmark tests. Specifically, current interactive segmentation methods usually assume that the user will click at the center of the maximum error area, but this assumption is not always valid in real - world scenarios. Therefore, these methods may perform less well than expected in actual deployments. To more accurately assess the real - performance of interactive segmentation methods, the authors propose a new benchmarking framework called RClicks. This framework collects click - data from 475,000 real - users through large - scale crowdsourcing research and develops a clickability model for generating simulated clicks that are closer to the clicking behavior of real - users. RClicks evaluates not only the average quality of the methods but also their robustness to different click - patterns. ### Main contributions of the paper: 1. **Large - scale multi - round interaction dataset**: The authors have collected a large - scale dataset containing multiple interaction rounds, covering a variety of image - segmentation tasks. 2. **New click - sampling strategy**: Based on the clickability model, a more realistic click - sampling method than the baseline strategy is proposed. 3. **RClicks benchmark**: The real - world performance of interactive segmentation methods is evaluated using the clickability model, revealing the deficiencies of existing methods in practical use. 4. **First - round real - user click evaluation**: Using the collected first - round real - user click - data, the performance of the segmentation methods is evaluated, and a method for estimating the segmentation difficulty of each instance in the dataset is proposed. ### Core problems of the paper: - **Evaluation bias of interactive segmentation methods**: Existing evaluation methods rely on simple strategies that assume user - clicking behavior, which may lead to over - fitting and performance overestimation. - **Complexity of real - user clicking behavior**: User - clicking behavior is affected by multiple factors, and simple click - strategies cannot fully capture these complexities. ### Solutions: - **Clickability model**: Combining the ideas of visual saliency prediction tasks, a clickability model is proposed to generate simulated clicks that are closer to the real - user clicking behavior. - **RClicks benchmark**: Through large - scale real - user click - data and the clickability model, a more comprehensive and realistic evaluation framework for interactive segmentation methods is provided. Through these improvements, the authors hope to promote the development of interactive segmentation methods so that they can provide a better user experience in practical applications.