Competitive Learning Enriches Learning Representation and Accelerates the Fine-tuning of CNNs

Takashi Shinozaki
DOI: https://doi.org/10.48550/arXiv.1804.09859
2018-04-26
Abstract:In this study, we propose the integration of competitive learning into convolutional neural networks (CNNs) to improve the representation learning and efficiency of fine-tuning. Conventional CNNs use back propagation learning, and it enables powerful representation learning by a discrimination task. However, it requires huge amount of labeled data, and acquisition of labeled data is much harder than that of unlabeled data. Thus, efficient use of unlabeled data is getting crucial for DNNs. To address the problem, we introduce unsupervised competitive learning into the convolutional layer, and utilize unlabeled data for effective representation learning. The results of validation experiments using a toy model demonstrated that strong representation learning effectively extracted bases of images into convolutional filters using unlabeled data, and accelerated the speed of the fine-tuning of subsequent supervised back propagation learning. The leverage was more apparent when the number of filters was sufficiently large, and, in such a case, the error rate steeply decreased in the initial phase of fine-tuning. Thus, the proposed method enlarged the number of filters in CNNs, and enabled a more detailed and generalized representation. It could provide a possibility of not only deep but broad neural networks.
Machine Learning,Computer Vision and Pattern Recognition,Neural and Evolutionary Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to effectively utilize unlabeled data in convolutional neural networks (CNNs) to improve representation learning and accelerate the fine - tuning process. Specifically, traditional CNNs mainly rely on supervised back - propagation (BP) learning. Although this method can achieve powerful representation learning, it requires a large amount of labeled data. However, obtaining a large amount of labeled data is costly and fraught with difficulties, while unlabeled data is relatively easy to obtain. Therefore, how to efficiently utilize unlabeled data has become a key issue in the research of deep neural networks (DNNs). To solve this problem, the author proposes to introduce competitive learning into the convolutional layer of CNNs to conduct effective representation learning by using unlabeled data. Competitive learning is an unsupervised learning method that can extract the feature bases of input data without relying on task labels, thus providing more general and detailed representations. Experimental results show that CNNs combined with competitive learning can not only learn stronger representations from unlabeled data, but also significantly accelerate the subsequent supervised BP learning process. In particular, when the number of filters is large, the error rate in the initial fine - tuning stage drops sharply. In summary, the main contributions of this paper are as follows: 1. Propose a method of combining competitive learning with CNNs to make full use of unlabeled data. 2. Prove through experiments that this method can enhance the effect of representation learning and significantly accelerate the fine - tuning process. 3. Propose the possibility of increasing the number of filters, enabling CNNs to achieve more detailed and diverse representations. The following are the core formulas of this method: - The weight update rule for competitive learning: \[ \Delta w_{l,i} = \begin{cases} -\rho z_{l - 1} & \text{if } i=\arg\max_k u_{l,k} \\ 0 & \text{otherwise} \end{cases} \] where $\rho$ is the learning coefficient, $z_{l - 1}$ is the output vector of the previous layer, and $u_{l,k}=W_l z_{l - 1}$ is the input vector of the $l$ - th layer. - L2 normalization of the weight vector: \[ w'_{l,i}=\frac{w_{l,i}+\Delta w_{l,i}}{\|w_{l,i}+\Delta w_{l,i}\|} \] These formulas ensure that competitive learning can be effectively carried out in the convolutional layer while maintaining the stability and generalization ability of the model.