Hierarchical Equalization Loss for Long-Tailed Instance Segmentation
Yaochi Zhao,Sen Chen,Shiguang Liu,Zhuhua Hu,Jingwen Xia
DOI: https://doi.org/10.1109/tmm.2024.3358080
IF: 7.3
2024-01-01
IEEE Transactions on Multimedia
Abstract:Multimedia data has the characteristics of large scale and skewed distribution with a long-tailed shape, which is a challenging imbalance problem faced by deep learning. In long-tailed image instance segmentation, the existing methods deal with this imbalance problem from a single perspective, ignoring the presence of multiple imbalance factors, which results in the limitation of performance. Considering that imbalances exist not only between positive and negative classes, but also between foreground and background subclasses, as well as between hard and easy examples, we argue that the losses of samples should be hierarchically equalized at multi-levels (HEL). In line with this idea, we first propose a focus based hierarchical-equalization loss (FHEL), which employs a class gradient ratio based reweighting mechanism to achieve the balance between classes, and uses a subclass-balance term and a sample-balance term to separately deal with the inter-subclass and inter-sample imbalances. FHEL can improve the performance of long-tailed instance segmentation in an end-to-end manner, avoiding the overfitting risk and manual hard division in the traditional methods. On the basis of FHEL, we further explore the relationship between inter-subclass imbalance and inter-sample imbalance, and propose a constrained-focus based hierarchical-equalization loss (CFHEL) that copes with the imbalances at multi-levels simultaneously with fewer hyperparameters. CFHEL is effective and easy to tune hyperparameters. We conduct extensive experiments on LVIS v1.0 and COCO-LT datasets with different benchmarks. Both FHEL and CFHEL are superior to the existing methods. On LVIS v1.0, with ResNet50 Mask R-CNN, ResNet101Mask R-CNN, ResNeXt101 Mask R-CNN and ResNet101 Cascade Mask R-CNN, CFHEL outperforms its baselines respectively with 19.8%, 18.5%, 21.6% and 21.2% AP% gains, and with 6.7%, 6.6% and 6.5% AP gains, achieving the new state-of-the-arts. On COCO-LT, our CFHEL outperforms the baseline with 13.2% tail AP gains and 3.3% whole AP gains, also achieving the new best performances.
computer science, information systems,telecommunications, software engineering