Abstract:Efficient single instance segmentation is essential for unlocking features in the mobile imaging applications, such as capture or editing. Existing on-the-fly mobile imaging applications scope the segmentation task to portraits or the salient subject due to the computational constraints. Instance segmentation, despite its recent developments towards efficient networks, is still heavy due to the cost of computation on the entire image to identify all instances. To address this, we propose and formulate a one tap driven single instance segmentation task that segments a single instance selected by a user via a positive tap. This task, in contrast to the broader task of segmenting anything as suggested in the Segment Anything Model \cite{sam}, focuses on efficient segmentation of a single instance specified by the user. To solve this problem, we present TraceNet, which explicitly locates the selected instance by way of receptive field tracing. TraceNet identifies image regions that are related to the user tap and heavy computations are only performed on selected regions of the image. Therefore overall computation cost and memory consumption are reduced during inference. We evaluate the performance of TraceNet on instance IoU average over taps and the proportion of the region that a user tap can fall into for a high-quality single-instance mask. Experimental results on MS-COCO and LVIS demonstrate the effectiveness and efficiency of the proposed approach. TraceNet can jointly achieve the efficiency and interactivity, filling in the gap between needs for efficient mobile inference and recent research trend towards multimodal and interactive segmentation models.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of efficient single - instance segmentation in mobile image applications. Specifically, existing mobile image applications are usually limited to handling the segmentation tasks of portraits or salient objects due to the limitation of computing resources. However, with the increasing demand for interactivity and efficiency, researchers hope to develop a method that can efficiently perform single - instance segmentation when the user taps on a specified object. #### Main problem description: 1. **Limited computing resources**: The computing power of mobile devices is limited, and it is difficult to run complex instance - segmentation algorithms. 2. **Limitations of existing methods**: Existing efficient segmentation methods are usually only applicable to salient objects (such as portraits) and cannot flexibly deal with any instance specified by the user. 3. **User experience**: The user's tap may not always accurately fall on the center of the target object, resulting in poor segmentation results and affecting the user experience. To solve these problems, the author proposes a single - instance - segmentation task driven by user taps and designs the TraceNet model. TraceNet locates the target instance tapped by the user by tracing the receptive field, thereby reducing unnecessary global calculations and improving the inference speed and memory - use efficiency. #### Core ideas of the solution: - **User - tap - driven**: The user guides the model to perform segmentation by tapping on an instance in the image. - **Receptive Field Tracing (RFT)**: By tracing the receptive - field area of the tap point in reverse, avoid performing intensive calculations on the entire image and only perform calculations in relevant areas. - **Local - feature encoding**: Use local features to dynamically adjust the segmentation - mask generator to further improve efficiency. Through these methods, TraceNet can significantly reduce the computing cost and memory consumption while maintaining high precision, and is especially suitable for real - time image - editing and - capturing applications on mobile devices. #### Experimental verification: The author evaluated the performance of TraceNet on the MS - COCO and LVIS datasets, demonstrating its efficiency and accuracy in the single - instance - segmentation task. The experimental results show that TraceNet not only has a significant improvement in computing efficiency, but also can provide high - quality segmentation masks and has a high user - tap tolerance.

TraceNet: Segment one thing efficiently

A Scalable Real-time Semantic Segmentation Network for Autonomous Driving

ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-high Resolution Segmentation

FocalClick: Towards Practical Interactive Image Segmentation.

SPRNet: Single-Pixel Reconstruction for One-Stage Instance Segmentation

A Two-Pipeline Instance Segmentation Network via Boundary Enhancement for Scene Understanding

Mask Transfiner for High-Quality Instance Segmentation

Sparse Instance Activation for Real-Time Instance Segmentation

TranSegNet: Hybrid CNN-Vision Transformers Encoder for Retina Segmentation of Optical Coherence Tomography

MobileInst: Video Instance Segmentation on the Mobile

Edge Assisted Real-time Instance Segmentation on Mobile Devices

YOLACT: Real-Time Instance Segmentation

When Humans Meet Machines: Towards Efficient Segmentation Networks.

IDNet: Information Decomposition Network for Fast Panoptic Segmentation.

A New Instance Segmentation Model for High-Resolution Remote Sensing Images Based on Edge Processing

Locate then Segment: A Strong Pipeline for Referring Image Segmentation

FSegNet: A Semantic Segmentation Network for High-Resolution Remote Sensing Images That Balances Efficiency and Performance

Attention-Guided Multi-Scale Fusion Network for Similar Objects Semantic Segmentation

EFRNet: A Lightweight Network with Efficient Feature Fusion and Refinement for Real-Time Semantic Segmentation

PointINS: Point-based instance segmentation