Abstract:This study presents a comparative analysis of two objective functions, Mean Squared Error (MSE) and Softmax Cross-Entropy (SCE) for neural network classification tasks. While SCE combined with softmax activation is the conventional choice for transforming network outputs into class probabilities, we explore an alternative approach using MSE with sigmoid activation. We introduce the Output Reset algorithm, which reduces inconsistent errors and enhances classifier robustness. Through extensive experiments on benchmark datasets (MNIST, CIFAR-10, and Fashion-MNIST), we demonstrate that MSE with sigmoid activation achieves comparable accuracy and convergence rates to SCE, while exhibiting superior performance in scenarios with noisy data. Our findings indicate that MSE, despite its traditional association with regression tasks, serves as a viable alternative for classification problems, challenging conventional wisdom about neural network training strategies.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to explore the performance of Mean Squared Error (MSE) combined with the sigmoid activation function in neural network classification tasks and compare it with the traditional Softmax Cross - Entropy (SCE) loss function. Specifically, the researchers introduced the Output Reset algorithm to reduce inconsistent errors and enhance the robustness of the classifier. They hope that through this method, they can challenge the traditional view that MSE is only suitable for regression tasks, while SCE is the standard choice for classification tasks. ### Main contributions of the paper: 1. **Proposing a classification method combining MSE and Sigmoid**: The paper explored the application of using MSE as a loss function combined with the sigmoid activation function in classification tasks, challenging the traditional view that MSE is only suitable for regression tasks. 2. **Introducing the Output Reset algorithm**: In order to reduce inconsistent errors and improve the robustness of the classifier, the researchers proposed the Output Reset algorithm. This algorithm reduces inconsistent errors during the training process by adjusting the target output. 3. **Experimental verification**: Through extensive experiments on multiple benchmark datasets (such as MNIST, CIFAR - 10, and Fashion - MNIST), it has been proven that the method of combining MSE and sigmoid performs excellently when dealing with noisy data, and its accuracy and convergence speed are comparable to SCE, and even better than SCE in some cases. ### Research background: - **Traditional methods**: Softmax Cross - Entropy (SCE) combined with the softmax activation function is a commonly used loss function combination in classification tasks, which can effectively convert network outputs into class probabilities. - **Application of MSE**: Although MSE is usually used for regression tasks, researchers have begun to explore its potential in classification tasks, especially when combined with the sigmoid activation function. ### Research methods: - **Theoretical analysis**: The theoretical basis of the combination of MSE and sigmoid was analyzed from a mathematical perspective, including its relationship with Maximum Likelihood Estimation (MLE) and Maximum A Posteriori (MAP). - **Experimental design**: Experiments were carried out on multiple benchmark datasets to compare the performance of the method of combining MSE and sigmoid with SCE under different conditions, especially the performance in a noisy data environment. ### Experimental results: - **Accuracy**: The experimental results show that the method of combining MSE and sigmoid achieved accuracy comparable to SCE on most datasets. - **Robustness**: When dealing with noisy data, the method of combining MSE and sigmoid shows better robustness and can maintain a high classification accuracy in a noisy environment. - **Convergence speed**: The convergence speed of the method of combining MSE and sigmoid is comparable to SCE on some datasets, and even faster. ### Conclusion: Through theoretical analysis and experiments, the paper proves that MSE combined with the sigmoid activation function can be an effective alternative for neural network classification tasks. In particular, by introducing the Output Reset algorithm, inconsistent errors can be significantly reduced and the robustness of the classifier can be improved. These findings challenge traditional views and provide new ideas for future neural network training strategies.

Making Sigmoid-MSE Great Again: Output Reset Challenges Softmax Cross-Entropy in Neural Network Classification

Cross Entropy in Deep Learning of Classifiers Is Unnecessary—ISBE Error Is All You Need

Rethinking Softmax with Cross-Entropy: Neural Network Classifier as Mutual Information Estimator

MPCE: A Maximum Probability Based Cross Entropy Loss Function for Neural Network Classification

Money on the Table: Statistical information ignored by Softmax can improve classifier accuracy

Neural Collapse Under MSE Loss: Proximity to and Dynamics on the Central Path

An Error Analysis for Deep Binary Classification with Sigmoid Loss

Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness.

Misclassification-guided loss under the weighted cross-entropy loss framework

Sigsoftmax: Reanalysis of the Softmax Bottleneck

Neural Network Classifier as Mutual Information Evaluator

SigCo: Eliminate the inter-class competition via sigmoid for learning with noisy labels

Large-Margin Regularized Softmax Cross-Entropy Loss

Adaptive Sparse Softmax: An Effective and Efficient Softmax Variant for Text Classification

Adaptive Adversarial Cross-Entropy Loss for Sharpness-Aware Minimization

Redefining The Self-Normalization Property

Improving Classification Performance of Softmax Loss Function Based on Scalable Batch-Normalization

Taming the Sigmoid Bottleneck: Provably Argmaxable Sparse Multi-Label Classification

EIS - Efficient and Trainable Activation Functions for Better Accuracy and Performance

Elephant Neural Networks: Born to Be a Continual Learner

Balanced softmax cross-entropy for incremental learning with and without memory