Big Cooperative Learning

Yulai Cong
2024-07-31
Abstract:Cooperation plays a pivotal role in the evolution of human intelligence; moreover, it also underlies the recent revolutionary advancement of artificial intelligence (AI) that is driven by foundation models. Specifically, we reveal that the training of foundation models can be interpreted as a form of big cooperative learning (\textit{abbr.} big learning), where massive learning individuals/tasks \emph{cooperate} to approach the unique essence of data from diverse perspectives of data prediction, leveraging a universal model. The presented big learning therefore unifies most training objectives of foundation models within a consistent framework, where their underlying assumptions are exposed simultaneously. We design tailored simulations to demonstrate the principle of big learning, based on which we provide learning-perspective justifications for the successes of foundation models, with interesting side-products. Furthermore, we reveal that big learning is a new dimension for upgrading conventional machine learning paradigms, valuable for endowing reinvigorations to associated applications; as an illustrative example, we propose the BigLearn-GAN, which is a novel adversarially-trained foundation model with versatile data sampling capabilities. Code is available at \texttt{<a class="link-external link-https" href="https://github.com/YulaiCong/BigCooperativeLearning" rel="external noopener nofollow">this https URL</a>}.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem this paper attempts to address is the training-test discrepancy in existing foundational model training methods. Specifically, while current foundational models like BERT and GPT have achieved significant success in certain tasks, their training methods often utilize only a portion of the information in data samples, leading to inconsistencies in the model's capabilities during training and testing. For example, the mask-and-predict training method primarily focuses on predicting the masked parts, which may not be the most needed capabilities during actual testing; in contrast, the next-token-prediction training method, although closer to the needs during testing, still has certain limitations. To address this issue, the paper proposes the concept of "Big Cooperative Learning" (Big Learning). The core idea of Big Learning is to fully utilize the various data sampling demonstrations contained in a single data sample (i.e., sampling and predicting the data from different perspectives) to form a large number of learning tasks that cooperate to approximate the essence of the data. This method not only reduces the training-test discrepancy but also improves the model's generalization ability and adaptability. The main contributions of the paper include: 1. Proposing the concept of Big Learning as a unified foundational model training framework and analyzing the assumptions behind existing foundational models. 2. Designing specific simulation experiments to demonstrate the principles of Big Learning in a lightweight manner and explaining the reasons for the success of foundational models from a learning perspective. 3. Pointing out that Big Learning is a new dimension for enhancing traditional machine learning paradigms, applying cutting-edge foundational model technologies to traditional machine learning through knowledge feedback, thereby revitalizing related applications. 4. As an example, proposing BigLearn-GAN, a variant of the traditional Generative Adversarial Network (GAN) improved based on Big Learning, with powerful multimodal data sampling capabilities. Through these contributions, the paper aims to provide theoretical support and technical guidance for the further improvement and development of future foundational models.