Correlations of Cross-Entropy Loss in Machine Learning

Richard Connor,Alan Dearle,Ben Claydon,Lucia Vadicamo
DOI: https://doi.org/10.3390/e26060491
IF: 2.738
2024-06-04
Entropy
Abstract:Cross-entropy loss is crucial in training many deep neural networks. In this context, we show a number of novel and strong correlations among various related divergence functions. In particular, we demonstrate that, in some circumstances, (a) cross-entropy is almost perfectly correlated with the little-known triangular divergence, and (b) cross-entropy is strongly correlated with the Euclidean distance over the logits from which the softmax is derived. The consequences of these observations are as follows. First, triangular divergence may be used as a cheaper alternative to cross-entropy. Second, logits can be used as features in a Euclidean space which is strongly synergistic with the classification process. This justifies the use of Euclidean distance over logits as a measure of similarity, in cases where the network is trained using softmax and cross-entropy. We establish these correlations via empirical observation, supported by a mathematical explanation encompassing a number of strongly related divergence functions.
physics, multidisciplinary
What problem does this paper attempt to address?
The paper aims to investigate and demonstrate strong correlations among various divergence functions, particularly focusing on the cross-entropy loss, which is a fundamental component in the training of many deep neural networks. The key contributions and findings of the paper can be summarized as follows: 1. **Strong Correlations Among Divergence Functions**: The authors show that there are strong correlations between cross-entropy divergence (CED), Kullback-Leibler divergence (KLD), Jensen-Shannon divergence (JSD), and triangular divergence (TRI) in certain spaces. These correlations are particularly strong when higher temperature values are used within the softmax function. 2. **Correlation Between Euclidean Divergence and Cross-Entropy**: The paper also establishes a tight correlation between the Euclidean divergence (EUC) over the logit space and the CED, to which the softmax function has been applied with a high temperature. This suggests that the Euclidean distance over logits could be a suitable measure of similarity in cases where the network is trained using softmax and cross-entropy. 3. **Practical Applications**: - The almost perfect correlation between CED and TRI suggests that TRI can be used as a computationally cheaper alternative to CED. - The strong correlation between EUC and CED indicates that Euclidean distance over the logit space could be a better metric for assessing similarity post-training, challenging the commonly recommended cosine distance.