Abstract:This paper introduces the point-axis representation for oriented object detection, emphasizing its flexibility and geometrically intuitive nature with two key components: points and axes. 1) Points delineate the spatial extent and contours of objects, providing detailed shape descriptions. 2) Axes define the primary directionalities of objects, providing essential orientation cues crucial for precise detection. The point-axis representation decouples location and rotation, addressing the loss discontinuity issues commonly encountered in traditional bounding box-based approaches. For effective optimization without introducing additional annotations, we propose the max-projection loss to supervise point set learning and the cross-axis loss for robust axis representation learning. Further, leveraging this representation, we present the Oriented DETR model, seamlessly integrating the DETR framework for precise point-axis prediction and end-to-end detection. Experimental results demonstrate significant performance improvements in oriented object detection tasks.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the loss discontinuity problem in **Oriented Object Detection**. Specifically, in traditional methods based on rotated bounding boxes, when dealing with non - axis - aligned objects, due to sudden changes in rotation angles or width - height definitions, the loss function becomes discontinuous, thus affecting the learning stability and detection performance of the model.
### Problem Background
1. **Limitations of Traditional Methods**:
- **Rotated Bounding Box Representation**: Although it can flexibly represent objects in any direction, in some cases (such as when the length and width are nearly equal), the angle θ will switch between θ and θ±90°, resulting in discontinuity of the loss function.
- **Quadrilateral Representation**: The rotated bounding box is defined by the circumscribed horizontal box and the offsets of four vertices. However, when the object is close to the horizontal direction, the vertex regression order becomes ambiguous.
- **Point - Set Representation**: Although it can capture the detailed position of the target, it often ignores the main directionality of the object, making it difficult to accurately detect objects with complex shapes.
2. **Challenges of Existing Methods**:
- **Loss Discontinuity**: Due to the angle periodicity problem, traditional methods will have angle jumps in some cases, resulting in difficult optimization.
- **Representation Ambiguity**: Some methods have ambiguity in defining boundaries, especially when dealing with approximately circular or square objects.
### Solutions Proposed in the Paper
To solve the above problems, this paper introduces a new **Point - Axis Representation**, whose core idea is to decouple the position and direction of the object, thus avoiding the loss discontinuity problem in traditional methods. Specifically:
1. **Advantages of Point - Axis Representation**:
- **Points for Shape Description**: Points are used to describe the spatial extent and contour of the object, providing a detailed shape representation, especially suitable for irregular - shaped objects.
- **Axes for Direction Hints**: Axes are used to define the main directionality of the object, providing key direction information, which is helpful for accurate detection.
2. **Innovative Loss Functions**:
- **Max - Projection Loss**: It supervises point - set learning and promotes object convergence without explicit joint - point annotations.
- **Cross - Axis Loss**: By discretizing the angle and applying smoothing processing, it generates four - peak label encoding, enhancing the robustness of the axis representation.
3. **Model Architecture**:
- **Oriented DETR Model**: Combined with the DETR framework, it introduces conditional point queries and a point - detection decoder, captures the relationships between points through a multi - layer self - attention mechanism, and performs iterative refinement.
### Experimental Results
The experimental results show that this method significantly improves the performance of the oriented object - detection task on multiple datasets, especially when dealing with objects with complex shapes and directions.
In conclusion, this paper effectively solves the loss discontinuity problem in oriented object detection and improves the detection accuracy and robustness by introducing the point - axis representation and the corresponding loss functions.