Quality-Aware Network for Human Parsing

Lu Yang,Qing Song,Zhihui Wang,Zhiwei Liu,Songcen Xu,Zhihao Li
DOI: https://doi.org/10.1109/tmm.2022.3217413
IF: 7.3
2022-01-01
IEEE Transactions on Multimedia
Abstract:How to estimate the quality of the network output is an important issue, and currently there is no effective solution in the field of human parsing. To solve this problem, this work proposes a statistical method based on the output probability map to calculate the pixel classification quality, which is called pixel score. In addition, the Quality-Aware Module (QAM) is proposed to fuse the different quality information, the purpose of which is to estimate the quality of human parsing results. We combine QAM with a concise and effective network design to propose Quality-Aware Network (QANet) for human parsing. Benefiting from the superiority of QAM and QANet, we achieve the best performance on three multiple and one single human parsing benchmarks, including CIHP, MHP-v2, Pascal-Person-Part, ATR and LIP. Without increasing the training and inference time, QAM improves the APr criterion by more than 10 points in the multiple human parsing task. QAM can be extended to other tasks with good quality estimation, e.g instance segmentation. Specifically, QAM improves Mask R-CNN by ∼1% mAP on COCO and LVISv1.0 datasets. Based on the proposed QAM and QANet, our overall system wins 1st place in CVPR2021 L2ID High-resolution Human Parsing (HRHP) Challenge, and 2nd in CVPR2021 PIC Short-video Face Parsing (SFP) Challenge. Code and models are available at https://github.com/soeaver/QANet.
computer science, information systems,telecommunications, software engineering
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the quality evaluation problem in the human parsing task. Specifically, the paper proposes a statistical method based on the output probability map to calculate the pixel classification quality, and introduces the Quality - Aware Module (QAM) to fuse quality information from different sources, thereby estimating the quality of human parsing results. #### Main problem description: 1. **Insufficient quality evaluation**: In the field of human parsing, there is currently a lack of effective solutions to estimate the quality of network outputs. Existing methods are difficult to accurately reflect the quality of parsing results, especially when dealing with complex backgrounds, confusing categories and long - tail phenomena. 2. **Limitations of existing methods**: Many existing methods can only reflect the quality of detection results when evaluating the quality of parsing results, rather than the quality of parsing results. This makes it difficult to effectively filter out low - quality results through thresholds. 3. **Multi - human parsing challenges**: In multi - human parsing tasks, due to the existence of instance boundaries and confusing regions, the average confidence calculation is prone to bias, resulting in inaccurate quality evaluation. #### Solutions: To solve the above problems, the paper proposes the following innovations: - **Pixel Score**: By extracting high - confidence regions from the probability map and calculating the average confidence of these regions as pixel classification quality information. This can more accurately reflect the classification quality of each pixel. - **Quality - Aware Module (QAM)**: A module is designed to fuse different quality information (such as box scores, IoU scores and pixel scores), and generate the final quality score by exponential weighting. QAM is a post - processing mechanism independent of the network structure and can be combined with other methods. - **Quality - Aware Network (QANet)**: Combining QAM and a simple and effective network design, a network architecture suitable for single - and multi - human parsing tasks is proposed. QANet uses ResNet - FPN or HRNet as the backbone network, generates high - resolution features through semantic FPN, and predicts parsing results and IoU scores. #### Experimental results: The paper has been extensively evaluated on multiple benchmark datasets (CIHP, MHP - v2, Pascal - Person - Part, ATR and LIP), and the results show that QANet has achieved state - of - the - art performance on multiple evaluation metrics. In particular, under the APr standard, the performance of QANet in multi - human parsing tasks is significantly better than existing methods. In conclusion, this paper effectively improves the performance and reliability of human parsing tasks by introducing new quality evaluation methods and modular design.