Objects as Points

Xingyi Zhou,Dequan Wang,Philipp Krähenbühl

DOI: https://doi.org/10.48550/arXiv.1904.07850

2019-04-26

Abstract:Detection identifies objects as axis-aligned boxes in an image. Most successful object detectors enumerate a nearly exhaustive list of potential object locations and classify each. This is wasteful, inefficient, and requires additional post-processing. In this paper, we take a different approach. We model an object as a single point --- the center point of its bounding box. Our detector uses keypoint estimation to find center points and regresses to all other object properties, such as size, 3D location, orientation, and even pose. Our center point based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors. CenterNet achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi-scale testing at 1.4 FPS. We use the same approach to estimate 3D bounding box in the KITTI benchmark and human pose on the COCO keypoint dataset. Our method performs competitively with sophisticated multi-stage methods and runs in real-time.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### The Problems This Paper Attempts to Solve This paper aims to address the following main issues: 1. **Simplifying Object Detection**: - Current object detection methods rely on enumerating a large number of potential object locations and classifying each one, which is both inefficient and resource-intensive. - The paper proposes a new method that represents objects as a point at the center of their bounding box, thereby simplifying the object detection process. 2. **Improving Detection Speed and Accuracy**: - By using keypoint estimation to find the center point of objects and regressing to other attributes (such as size, 3D position, orientation, etc.), the entire detection process becomes more efficient and accurate. - The proposed method is called CenterNet, which is an end-to-end differentiable model that is faster and more accurate than existing bounding box-based detectors. 3. **Avoiding Non-Maximum Suppression (NMS)**: - Most current detectors require additional post-processing steps (such as NMS), which makes the model difficult to train end-to-end. - CenterNet avoids NMS by directly extracting local peaks from the keypoint heatmap, thereby simplifying the entire process. 4. **Extending to Other Tasks**: - This method is not only applicable to 2D object detection but can also be extended to 3D object detection and multi-person pose estimation tasks. Overall, this paper attempts to simplify the object detection process and improve its speed and accuracy through a new object representation method (i.e., the center point).

Objects as Points

CenterNet: Keypoint Triplets for Object Detection

CenterNet3D: An Anchor Free Object Detector for Point Cloud

CenterNet3D: An Anchor free Object Detector for Autonomous Driving.

Center-Based 3D Object Detection and Tracking.

Bottom-Up Object Detection by Grouping Extreme and Center Points

ObjectBox: From Centers to Boxes for Anchor-Free Object Detection

RepPoints: Point Set Representation for Object Detection

CenterPoint-SE: A Single-Stage Anchor-Free 3-D Object Detection Algorithm With Spatial Awareness Enhancement

Tracking Objects as Points

Center Point Prediction Using Gaussian Elliptic and Size Component Regression Using Small Solution Space for Object Detection

CornerNet: Detecting Objects as Paired Keypoints

Stereo CenterNet based 3D Object Detection for Autonomous Driving

GridPointNet: Grid and Point-Based 3D Object Detection from Point Cloud

From Points to Multi-Object 3D Reconstruction

CrossNet: Detecting Objects As Crosses.

An anchor-free object detector with novel corner matching method

3DSSD: Point-based 3D Single Stage Object Detector