On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline.

Nicklas Hansen,Zhecheng Yuan,Yanjie Ze,Tongzhou Mu,Aravind Rajeswaran,Hao Su,Huazhe Xu,Xiaolong Wang
DOI: https://doi.org/10.48550/arxiv.2212.05749
2022-01-01
Abstract:In this paper, we examine the effectiveness of pre-training for visuo-motor control tasks. We revisit a simple Learning-from-Scratch (LfS) baseline that incorporates data augmentation and a shallow ConvNet, and find that this baseline is surprisingly competitive with recent approaches (PVR, MVP, R3M) that leverage frozen visual representations trained on large-scale vision datasets -- across a variety of algorithms, task domains, and metrics in simulation and on a real robot. Our results demonstrate that these methods are hindered by a significant domain gap between the pre-training datasets and current benchmarks for visuo-motor control, which is alleviated by finetuning. Based on our findings, we provide recommendations for future research in pre-training for control and hope that our simple yet strong baseline will aid in accurately benchmarking progress in this area. Code: https://github.com/gemcollector/learning-from-scratch.
What problem does this paper attempt to address?