Abstract:After four decades of research there still exists a Classification accuracy gap of about 20% between our best Unsupervisedly Learned Representations methods and the accuracy rates achieved by intelligent animals. It thus may well be that we are looking in the wrong direction. A possible solution to this puzzle is presented. We demonstrate that Reinforcement Learning can learn representations which achieve the same accuracy as that of animals. Our main modest contribution lies in the observations that: a. when applied to a real world environment Reinforcement Learning does not require labels, and thus may be legitimately considered as Unsupervised Learning, and b. in contrast, when Reinforcement Learning is applied in a simulated environment it does inherently require labels and should thus be generally be considered as Supervised Learning. The corollary of these observations is that further search for Unsupervised Learning competitive paradigms which may be trained in simulated environments may be futile.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is **the approximately 20% gap in classification accuracy between unsupervised learning representations and intelligent animals (especially humans)**. After 40 years of research, although unsupervised learning methods have made progress, their performance is still far inferior to that of intelligent animals, especially in classification tasks. ### Specific problem description: 1. **Limitations of unsupervised learning**: The best current unsupervised learning methods have an accuracy rate in classification tasks that is approximately 20% lower than that of intelligent animals. This gap indicates that we may be looking for solutions in the wrong direction. 2. **Differences between simulated and real environments**: The paper points out that when Reinforcement Learning (RL) is applied in a real environment, labels are not required, so it can be regarded as unsupervised learning; when RL is applied in a simulated environment, labels are usually required, so it should be regarded as supervised learning. ### Main contributions of the paper: - **Proposing a new perspective**: By applying RL in a real environment, an unsupervised learning effect similar to that of animals can be obtained, thereby bridging the existing performance gap. - **Distinguishing the nature of RL in different environments**: RL can be regarded as unsupervised learning in a real environment, while it is closer to supervised learning in a simulated environment. ### Core viewpoints of the paper: - **The necessity of further exploring unsupervised learning**: If we continue to search for unsupervised learning methods that can match the performance of animals in a simulated environment, it may be in vain. Instead, more attention should be paid to how to apply RL in a real environment to achieve unsupervised learning. ### Conclusion: The paper suggests that if we can successfully apply RL in a real environment, we may find a truly effective unsupervised learning method, thereby narrowing or eliminating the performance gap with animals. This provides a new direction and idea for future unsupervised learning research. --- ### Summary of relevant formulas: 1. **Definition of cumulative reward (Return)**: \[ G_t = R_{t + 1}+\gamma R_{t + 2}+\gamma^2 R_{t + 3}+\cdots=R_{t + 1}+\gamma G_{t + 1} \] where \(G_t\) is the cumulative reward starting from time step \(t\), \(R_{t + i}\) is the immediate reward obtained at time step \(t + i\), and \(\gamma\) is the discount factor (\(0\leqslant\gamma\leqslant1\)). 2. **Definition of state - value function (Value Function)**: \[ V(S_t = s)=E[G_t|S_t = s]=E[R_{t + 1}+\gamma V(S_{t + 1})|S_t = s] \] This is in the form of the Bellman equation. 3. **Parameter update formulas**: - For the update of the value function: \[ V(S_t = s)_{\text{new}}=V(S_t = s)_{\text{old}}+\alpha[R_{t + 1}+\gamma V(S_{t + 1}=s')_{\text{old}}-V(S_t = s)_{\text{old}}] \] - For the update of the policy network (Policy Gradient): \[ W_\pi^{t + 1}=W_\pi^t+\eta_\pi[R_{t + 1}+\gamma V(S_{t + 1};W_V^t)-V(S_t;W_V^t)]\nabla\log\pi(A_t|S_t;W_\pi^t) \] These formulas show the core mechanisms in RL, especially how to optimize the policy through cumulative rewards and value functions.

Unsupervisedly Learned Representations: Should the Quest be Over?

Unsupervised Representation Learning in Partially Observable Atari Games

Unsupervised Representation Learning in Deep Reinforcement Learning: A Review

Unsupervised Control Through Non-Parametric Discriminative Rewards

Reinforcement Learning with Unsupervised Auxiliary Tasks

Towards Unsupervised Representation Learning: Learning, Evaluating and Transferring Visual Representations

Unsupervised State Representation Learning in Atari

NavRep: Unsupervised Representations for Reinforcement Learning of Robot Navigation in Dynamic Human Environments

Is 'Unsupervised Learning' a Misconceived Term?

The Clever Hans Effect in Unsupervised Learning

Revisiting Supervision for Continual Representation Learning

The Next Big Thing(s) in Unsupervised Machine Learning: Five Lessons from Infant Learning

A Computational Model of Representation Learning in the Brain Cortex, Integrating Unsupervised and Reinforcement Learning

A Survey on Self-Supervised Representation Learning

Unified Representations for Learning and Reasoning

Demystifying unsupervised learning: how it helps and hurts

From Centralized to Self-Supervised: Pursuing Realistic Multi-Agent Reinforcement Learning

Rethinking Generalizability and Discriminability of Self-Supervised Learning from Evolutionary Game Theory Perspective

Unsupervised learning by competing hidden units

Small Data Challenges in Big Data Era: A Survey of Recent Progress on Unsupervised and Semi-Supervised Methods

Learning Curricula in Open-Ended Worlds