Internet Explorer: Targeted Representation Learning on the Open Web

Alexander C. Li,Ellis Brown,Alexei A. Efros,Deepak Pathak
2023-09-07
Abstract:Modern vision models typically rely on fine-tuning general-purpose models pre-trained on large, static datasets. These general-purpose models only capture the knowledge within their pre-training datasets, which are tiny, out-of-date snapshots of the Internet -- where billions of images are uploaded each day. We suggest an alternate approach: rather than hoping our static datasets transfer to our desired tasks after large-scale pre-training, we propose dynamically utilizing the Internet to quickly train a small-scale model that does extremely well on the task at hand. Our approach, called Internet Explorer, explores the web in a self-supervised manner to progressively find relevant examples that improve performance on a desired target dataset. It cycles between searching for images on the Internet with text queries, self-supervised training on downloaded images, determining which images were useful, and prioritizing what to search for next. We evaluate Internet Explorer across several datasets and show that it outperforms or matches CLIP oracle performance by using just a single GPU desktop to actively query the Internet for 30--40 hours. Results, visualizations, and videos at <a class="link-external link-https" href="https://internet-explorer-ssl.github.io/" rel="external noopener nofollow">this https URL</a>
Machine Learning,Artificial Intelligence,Computer Vision and Pattern Recognition,Neural and Evolutionary Computing,Robotics
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve The paper "Internet Explorer: Targeted Representation Learning on the Open Web" attempts to solve the problem of how to leverage dynamic data from the internet to improve visual model representations for specific tasks. Traditional deep learning models typically rely on large-scale, static datasets for pre-training, followed by fine-tuning on small-scale datasets for specific tasks. However, these static datasets often fail to capture the rich, continuously updated information available on the internet, leading to suboptimal performance when models encounter new data. Specifically, the paper proposes a method called **Internet Explorer** to address this issue through the following steps: 1. **Dynamic Utilization of Internet Data**: Unlike traditional static datasets, Internet Explorer views the internet as a dynamic, open data source. It incrementally finds image data relevant to the target task through a self-supervised approach. 2. **Self-Supervised Exploration**: The method uses text queries to search engines, downloads relevant images, and performs self-supervised training to enhance performance on the target dataset. 3. **Continuous Query Optimization**: Internet Explorer continuously evaluates the contribution of downloaded images to the target dataset and adjusts subsequent query strategies based on this feedback, thereby gradually improving the quality of model representations. Through this approach, the paper aims to overcome the limitations of static datasets and efficiently enhance model performance for specific tasks by leveraging the rich resources available on the internet.