Making Sigmoid-MSE Great Again: Output Reset Challenges Softmax Cross-Entropy in Neural Network Classification

Kanishka Tyagi,Chinmay Rane,Ketaki Vaidya,Jeshwanth Challgundla,Soumitro Swapan Auddy,Michael Manry
2024-11-18
Abstract:This study presents a comparative analysis of two objective functions, Mean Squared Error (MSE) and Softmax Cross-Entropy (SCE) for neural network classification tasks. While SCE combined with softmax activation is the conventional choice for transforming network outputs into class probabilities, we explore an alternative approach using MSE with sigmoid activation. We introduce the Output Reset algorithm, which reduces inconsistent errors and enhances classifier robustness. Through extensive experiments on benchmark datasets (MNIST, CIFAR-10, and Fashion-MNIST), we demonstrate that MSE with sigmoid activation achieves comparable accuracy and convergence rates to SCE, while exhibiting superior performance in scenarios with noisy data. Our findings indicate that MSE, despite its traditional association with regression tasks, serves as a viable alternative for classification problems, challenging conventional wisdom about neural network training strategies.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to explore the performance of Mean Squared Error (MSE) combined with the sigmoid activation function in neural network classification tasks and compare it with the traditional Softmax Cross - Entropy (SCE) loss function. Specifically, the researchers introduced the Output Reset algorithm to reduce inconsistent errors and enhance the robustness of the classifier. They hope that through this method, they can challenge the traditional view that MSE is only suitable for regression tasks, while SCE is the standard choice for classification tasks. ### Main contributions of the paper: 1. **Proposing a classification method combining MSE and Sigmoid**: The paper explored the application of using MSE as a loss function combined with the sigmoid activation function in classification tasks, challenging the traditional view that MSE is only suitable for regression tasks. 2. **Introducing the Output Reset algorithm**: In order to reduce inconsistent errors and improve the robustness of the classifier, the researchers proposed the Output Reset algorithm. This algorithm reduces inconsistent errors during the training process by adjusting the target output. 3. **Experimental verification**: Through extensive experiments on multiple benchmark datasets (such as MNIST, CIFAR - 10, and Fashion - MNIST), it has been proven that the method of combining MSE and sigmoid performs excellently when dealing with noisy data, and its accuracy and convergence speed are comparable to SCE, and even better than SCE in some cases. ### Research background: - **Traditional methods**: Softmax Cross - Entropy (SCE) combined with the softmax activation function is a commonly used loss function combination in classification tasks, which can effectively convert network outputs into class probabilities. - **Application of MSE**: Although MSE is usually used for regression tasks, researchers have begun to explore its potential in classification tasks, especially when combined with the sigmoid activation function. ### Research methods: - **Theoretical analysis**: The theoretical basis of the combination of MSE and sigmoid was analyzed from a mathematical perspective, including its relationship with Maximum Likelihood Estimation (MLE) and Maximum A Posteriori (MAP). - **Experimental design**: Experiments were carried out on multiple benchmark datasets to compare the performance of the method of combining MSE and sigmoid with SCE under different conditions, especially the performance in a noisy data environment. ### Experimental results: - **Accuracy**: The experimental results show that the method of combining MSE and sigmoid achieved accuracy comparable to SCE on most datasets. - **Robustness**: When dealing with noisy data, the method of combining MSE and sigmoid shows better robustness and can maintain a high classification accuracy in a noisy environment. - **Convergence speed**: The convergence speed of the method of combining MSE and sigmoid is comparable to SCE on some datasets, and even faster. ### Conclusion: Through theoretical analysis and experiments, the paper proves that MSE combined with the sigmoid activation function can be an effective alternative for neural network classification tasks. In particular, by introducing the Output Reset algorithm, inconsistent errors can be significantly reduced and the robustness of the classifier can be improved. These findings challenge traditional views and provide new ideas for future neural network training strategies.