Abstract:I apply recent work on "learning to think" (2015) and on PowerPlay (2011) to the incremental training of an increasingly general problem solver, continually learning to solve new tasks without forgetting previous skills. The problem solver is a single recurrent neural network (or similar general purpose computer) called ONE. ONE is unusual in the sense that it is trained in various ways, e.g., by black box optimization / reinforcement learning / artificial evolution as well as supervised / unsupervised learning. For example, ONE may learn through neuroevolution to control a robot through environment-changing actions, and learn through unsupervised gradient descent to predict future inputs and vector-valued reward signals as suggested in 1990. User-given tasks can be defined through extra goal-defining input patterns, also proposed in 1990. Suppose ONE has already learned many skills. Now a copy of ONE can be re-trained to learn a new skill, e.g., through neuroevolution without a teacher. Here it may profit from re-using previously learned subroutines, but it may also forget previous skills. Then ONE is retrained in PowerPlay style (2011) on stored input/output traces of (a) ONE's copy executing the new skill and (b) previous instances of ONE whose skills are still considered worth memorizing. Simultaneously, ONE is retrained on old traces (even those of unsuccessful trials) to become a better predictor, without additional expensive interaction with the enviroment. More and more control and prediction skills are thus collapsed into ONE, like in the chunker-automatizer system of the neural history compressor (1991). This forces ONE to relate partially analogous skills (with shared algorithmic information) to each other, creating common subroutines in form of shared subnetworks of ONE, to greatly speed up subsequent learning of additional, novel but algorithmically related skills.

Limits of End-to-End Learning

Putting An End to End-to-End: Gradient-Isolated Learning of Representations

End-to-End Training Induces Information Bottleneck through Layer-Role Differentiation: A Comparative Analysis with Layer-wise Training

Towards New Generation, Biologically Plausible Deep Neural Network Learning

Functions that Emerge through End-to-End Reinforcement Learning - The Direction for Artificial General Intelligence -

Scaling Laws Beyond Backpropagation

Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory

Limitations of Neural Collapse for Understanding Generalization in Deep Learning

Unsupervised End-to-End Training with a Self-Defined Target

Towards learning-to-learn

The Perils of Learning Before Optimizing

Continual Learning with Deep Artificial Neurons

Revisiting Locally Supervised Learning: an Alternative to End-to-end Training

The limitations of automatically generated curricula for continual learning

On the Complexity of Learning Neural Networks

One Big Net For Everything

Modelling continual learning in humans with Hebbian context gating and exponentially decaying task signals

Towards the One Learning Algorithm Hypothesis: A System-theoretic Approach

Transferring Knowledge across Learning Processes

Is Extreme Learning Machine Feasible? A Theoretical Assessment (Part II)

Performance comparison of various end-to-end learning technologies with a bandwidth-limited OWC system