Abstract:We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. Our SSD model is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stage and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, MS COCO, and ILSVRC datasets confirm that SSD has comparable accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. Compared to other single stage methods, SSD has much better accuracy, even with a smaller input image size. For $300\times 300$ input, SSD achieves 72.1% mAP on VOC2007 test at 58 FPS on a Nvidia Titan X and for $500\times 500$ input, SSD achieves 75.1% mAP, outperforming a comparable state of the art Faster R-CNN model. Code is available at <a class="link-external link-https" href="https://github.com/weiliu89/caffe/tree/ssd" rel="external noopener nofollow">this https URL</a> .

Bounding Box Embedding for Single Shot Person Instance Segmentation

PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model

SSD: Single Shot MultiBox Detector

One-Shot Instance Segmentation

Box2Mask: Box-supervised Instance Segmentation via Level-set Evolution

Box-supervised Instance Segmentation with Level Set Evolution

PolarMask: Single Shot Instance Segmentation With Polar Representation

Box2Mask: Weakly Supervised 3D Semantic Instance Segmentation Using Bounding Boxes

Deep Level Set for Box-supervised Instance Segmentation in Aerial Images

BorderPointsMask: One-stage Instance Segmentation with Boundary Points Representation.

Learning Universal Shape Dictionary for Realtime Instance Segmentation

Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds

BoxInst: High-Performance Instance Segmentation with Box Annotations.

Object Bounding Box-Aware Embedding for Point Cloud Instance Segmentation

iFS-RCNN: An Incremental Few-shot Instance Segmenter

Deep Markov Clustering for Panoptic Segmentation.

A Two-Pipeline Instance Segmentation Network via Boundary Enhancement for Scene Understanding

Boundary-Aware Instance Segmentation

SSAP: Single-Shot Instance Segmentation With Affinity Pyramid

UniInst: Unique representation for end-to-end instance segmentation