Inducing Semantic Hierarchy Structure in Empirical Risk Minimization with Optimal Transport Measures

Wanqing Xie,Yubin Ge,Site Li,Mingzhen Li,Xuyang Li,Zhenhua Guo,Jane You,Xiaofeng Liu
DOI: https://doi.org/10.1016/j.neucom.2023.01.093
IF: 6
2023-02-05
Neurocomputing
Abstract:The cross-entropy (CE) loss is arguably the most important empirical risk minimization objective for deep discriminative models for classification, and has achieved notable success in numerous applications. Though the CE loss is widely adopted, it essentially ignores the correlation between categories. For example, predicting a shepherd dog to husky is more acceptable than a tiger for the subsequent decision processes, while these two misclassifications result in the same CE loss. Therefore, the usually used CE loss does not incorporate the risk of misclassification of different categories, which can be measured by the distance between the predicted category and ground-truth category in a semantic hierarchical tree (SHT). In this work, to explicitly take the SHT-defined risk-aware inter-categorical correlation into consideration, by proposing a discrete optimal transport (DOT) training framework via configuring its ground distance matrix. We are able to predefine ground distance matrix in optimal transport measurement following a priori of hierarchical semantic risk. Specifically, the tree-induced error (TIE) on SHT is adopted as our ground distance matrix. Furthermore, it can be extended to its increasing function from the optimization perspective. In addition, we can also adaptively learn the matrix following an alternative optimization scheme. The semantic similarity in each level of a tree is integrated with the information gain. We demonstrated the effectiveness of our framework in several benchmarks of large-scale image classification with the semantic tree structure, and showed superior performance in a plug-and-play manner. footnoteThe code is available in: https://anonymous.4open.science/r/OTM-Neurocomputing/
computer science, artificial intelligence
What problem does this paper attempt to address?