Unsupervisedly Learned Representations: Should the Quest be Over?

Daniel N. Nissani
2024-09-26
Abstract:After four decades of research there still exists a Classification accuracy gap of about 20% between our best Unsupervisedly Learned Representations methods and the accuracy rates achieved by intelligent animals. It thus may well be that we are looking in the wrong direction. A possible solution to this puzzle is presented. We demonstrate that Reinforcement Learning can learn representations which achieve the same accuracy as that of animals. Our main modest contribution lies in the observations that: a. when applied to a real world environment Reinforcement Learning does not require labels, and thus may be legitimately considered as Unsupervised Learning, and b. in contrast, when Reinforcement Learning is applied in a simulated environment it does inherently require labels and should thus be generally be considered as Supervised Learning. The corollary of these observations is that further search for Unsupervised Learning competitive paradigms which may be trained in simulated environments may be futile.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is **the approximately 20% gap in classification accuracy between unsupervised learning representations and intelligent animals (especially humans)**. After 40 years of research, although unsupervised learning methods have made progress, their performance is still far inferior to that of intelligent animals, especially in classification tasks. ### Specific problem description: 1. **Limitations of unsupervised learning**: The best current unsupervised learning methods have an accuracy rate in classification tasks that is approximately 20% lower than that of intelligent animals. This gap indicates that we may be looking for solutions in the wrong direction. 2. **Differences between simulated and real environments**: The paper points out that when Reinforcement Learning (RL) is applied in a real environment, labels are not required, so it can be regarded as unsupervised learning; when RL is applied in a simulated environment, labels are usually required, so it should be regarded as supervised learning. ### Main contributions of the paper: - **Proposing a new perspective**: By applying RL in a real environment, an unsupervised learning effect similar to that of animals can be obtained, thereby bridging the existing performance gap. - **Distinguishing the nature of RL in different environments**: RL can be regarded as unsupervised learning in a real environment, while it is closer to supervised learning in a simulated environment. ### Core viewpoints of the paper: - **The necessity of further exploring unsupervised learning**: If we continue to search for unsupervised learning methods that can match the performance of animals in a simulated environment, it may be in vain. Instead, more attention should be paid to how to apply RL in a real environment to achieve unsupervised learning. ### Conclusion: The paper suggests that if we can successfully apply RL in a real environment, we may find a truly effective unsupervised learning method, thereby narrowing or eliminating the performance gap with animals. This provides a new direction and idea for future unsupervised learning research. --- ### Summary of relevant formulas: 1. **Definition of cumulative reward (Return)**: \[ G_t = R_{t + 1}+\gamma R_{t + 2}+\gamma^2 R_{t + 3}+\cdots=R_{t + 1}+\gamma G_{t + 1} \] where \(G_t\) is the cumulative reward starting from time step \(t\), \(R_{t + i}\) is the immediate reward obtained at time step \(t + i\), and \(\gamma\) is the discount factor (\(0\leqslant\gamma\leqslant1\)). 2. **Definition of state - value function (Value Function)**: \[ V(S_t = s)=E[G_t|S_t = s]=E[R_{t + 1}+\gamma V(S_{t + 1})|S_t = s] \] This is in the form of the Bellman equation. 3. **Parameter update formulas**: - For the update of the value function: \[ V(S_t = s)_{\text{new}}=V(S_t = s)_{\text{old}}+\alpha[R_{t + 1}+\gamma V(S_{t + 1}=s')_{\text{old}}-V(S_t = s)_{\text{old}}] \] - For the update of the policy network (Policy Gradient): \[ W_\pi^{t + 1}=W_\pi^t+\eta_\pi[R_{t + 1}+\gamma V(S_{t + 1};W_V^t)-V(S_t;W_V^t)]\nabla\log\pi(A_t|S_t;W_\pi^t) \] These formulas show the core mechanisms in RL, especially how to optimize the policy through cumulative rewards and value functions.