Huanrui Yang,Yafeng Huang,Zhen Dong,Denis A Gudovskiy,Tomoyuki Okuno,Yohei Nakata,Yuan Du,Kurt Keutzer,Shanghang Zhang
Abstract:The impact of quantization on the overall performance of deep learning models is a well-studied problem. However, understanding and mitigating its effects on a more fine-grained level is still lacking, especially for harder tasks such as object detection with both classification and regression objectives. This work defines the performance for a subset of task-critical categories, i.e. the critical-category performance, as a crucial yet largely overlooked fine-grained objective for detection tasks. We analyze the impact of quantization at the category-level granularity, and propose methods to improve performance for the critical categories. Specifically, we find that certain critical categories have a higher sensitivity to quantization, and are prone to overfitting after quantization-aware training (QAT). To explain this, we provide theoretical and empirical links between their performance gaps and the corresponding loss landscapes with the Fisher information framework. Using this evidence, we apply a Fisher-aware mixed-precision quantization scheme, and a Fisher-trace regularization for the QAT on the critical-category loss landscape. The proposed methods improve critical-category metrics of the quantized transformer-based DETR detectors. They are even more significant in case of larger models and higher number of classes where the overfitting becomes more severe. For example, our methods lead to 10.4% and 14.5% mAP gains for, correspondingly, 4-bit DETR-R50 and Deformable DETR on the most impacted critical classes in the COCO Panoptic dataset.
What problem does this paper attempt to address?
### Problems Addressed by the Paper
This paper aims to address the fine-grained analysis of the impact of quantization on the performance of deep learning models, particularly in object detection tasks where the tasks are more complex, involving both classification and regression targets. Specifically, the paper focuses on the performance degradation of certain critical categories during the quantization process.
### Background and Motivation
1. **Impact of Quantization**:
- Quantization is a commonly used compression technique that can reduce the memory footprint and inference latency of models, making it suitable for deployment on cloud and edge devices.
- However, quantization introduces perturbations in weights and activations, leading to a decline in the performance of floating-point models.
2. **Limitations of Existing Research**:
- Existing quantization research mainly focuses on the trade-off between model size and overall performance, such as average accuracy in classification tasks and mean Average Precision (mAP) in detection tasks.
- In practical applications, fine-grained performance targets (e.g., performance of critical categories) are often more important than overall performance.
3. **Needs in Practical Application Scenarios**:
- For example, in autonomous driving scenarios, certain non-critical objects (such as utility poles, trees, and buildings) only need to be located to avoid collisions, and their misclassification is not as critical.
- In contrast, critical categories (such as humans or vehicles) require precise classification and localization to ensure safe operation.
### Main Contributions of the Paper
1. **Defining Critical Category Targets**:
- The authors propose a task-related critical category target to evaluate the performance of critical categories in detection tasks.
- By merging non-critical categories into "other obstacles," the evaluation of critical category performance is simplified.
2. **Analysis of Quantization Impact on Critical Category Performance**:
- The authors observed through experiments that the impact of quantization on different categories varies, with some critical categories being more sensitive to quantization and experiencing more significant performance degradation.
- The reasons for these performance gaps are theoretically analyzed using the Fisher information framework.
3. **Proposed Improvement Methods**:
- **Fisher-aware Mixed Precision Quantization Scheme**: Optimizes the quantization precision of the model based on the sensitivity of critical categories to reduce performance degradation.
- **Fisher-trace Regularization**: Improves the performance of critical categories during Quantization-Aware Training (QAT) by regularizing the loss function.
### Experimental Results
- **Performance Comparison Before and After Quantization**:
- After quantization, the mAP of some critical categories can drop by up to 1.7%.
- After 50 rounds of QAT, performance improves, but the improvement in critical category performance is inconsistent, with a maximum improvement of 1.1% mAP.
- **Effectiveness of Proposed Improvement Methods**:
- Under the same mixed precision quantization budget, the Fisher-aware method outperforms uniform quantization and HAWQ-V2 baseline methods across different models and datasets.
- For example, 4-bit quantized DETR-R50 and Deformable DETR achieved mAP improvements of 10.4% and 14.5% respectively on critical categories in the COCO Panoptic dataset.
### Conclusion
By defining critical category targets, this paper analyzes the impact of quantization on these targets and proposes effective improvement methods that significantly enhance the performance of critical categories, especially in large models and scenarios with more categories. These methods are of great significance in practical applications, particularly in scenarios requiring high-precision detection.