Beimingwu: A Learnware Dock System

Zhi-Hao Tan,Jian-Dong Liu,Xiao-Dong Bi,Peng Tan,Qin-Cheng Zheng,Hai-Tian Liu,Yi Xie,Xiao-Chuan Zou,Yang Yu,Zhi-Hua Zhou
2024-01-24
Abstract:The learnware paradigm proposed by Zhou [2016] aims to enable users to reuse numerous existing well-trained models instead of building machine learning models from scratch, with the hope of solving new user tasks even beyond models' original purposes. In this paradigm, developers worldwide can submit their high-performing models spontaneously to the learnware dock system (formerly known as learnware market) without revealing their training data. Once the dock system accepts the model, it assigns a specification and accommodates the model. This specification allows the model to be adequately identified and assembled to reuse according to future users' needs, even if they have no prior knowledge of the model. This paradigm greatly differs from the current big model direction and it is expected that a learnware dock system housing millions or more high-performing models could offer excellent capabilities for both planned tasks where big models are applicable; and unplanned, specialized, data-sensitive scenarios where big models are not present or applicable.
Software Engineering,Cryptography and Security,Machine Learning
What problem does this paper attempt to address?
The problems that this paper attempts to solve are the many challenges in building high - performance models in the traditional machine - learning paradigm, specifically including: 1. **Lack of high - quality data**: Training a high - performance machine - learning model usually requires a large amount of high - quality data, which is difficult to obtain in many cases. 2. **Lack of professional skills**: Building and optimizing machine - learning models requires rich professional knowledge and experience, which is a huge obstacle for many users. 3. **Catastrophic forgetting**: When making incremental improvements to an already - trained model, it may cause the model to forget the knowledge it has previously learned. 4. **Difficulty in achieving continuous learning**: Existing models have difficulty adapting to the ever - changing task environment. 5. **Data privacy and ownership issues**: It is difficult for developers to share data because this involves privacy and intellectual property issues. 6. **Unforeseen new tasks**: New tasks in the real world are often unpredictable, and existing large - scale models have difficulty dealing with these specific scenarios. 7. **Carbon emission issues**: Repeatedly training models will lead to a large amount of resource waste and carbon emissions. To solve these problems, the paper proposes the Beimingwu system based on the "Learnware" (learning - ware) paradigm. This system aims to solve new user tasks by uniformly managing and reusing high - performance models submitted by global developers, without having to build models from scratch. Specifically, the Beimingwu system can: - **Simplify the model development process**: Even if users do not have a large amount of data and professional skills, they can quickly build and deploy high - performance models. - **Provide an integrated and extensible architecture design**: Support the management of the entire model life cycle, including submission, testing, organization, identification, deployment, and reuse. - **Protect the privacy of the original data**: By generating statistical specifications (such as RKME), users can submit task requirements without revealing the original data. - **Support diverse learning tasks**: Whether it is a planned task or an unforeseen specific scenario, Beimingwu can provide effective solutions. In this way, the Beimingwu system not only simplifies the development process of machine - learning models, but also solves multiple key problems in the existing paradigm, providing a solid foundation for future scientific research and applications.