Abstract:The learnware paradigm proposed by Zhou [2016] aims to enable users to reuse numerous existing well-trained models instead of building machine learning models from scratch, with the hope of solving new user tasks even beyond models' original purposes. In this paradigm, developers worldwide can submit their high-performing models spontaneously to the learnware dock system (formerly known as learnware market) without revealing their training data. Once the dock system accepts the model, it assigns a specification and accommodates the model. This specification allows the model to be adequately identified and assembled to reuse according to future users' needs, even if they have no prior knowledge of the model. This paradigm greatly differs from the current big model direction and it is expected that a learnware dock system housing millions or more high-performing models could offer excellent capabilities for both planned tasks where big models are applicable; and unplanned, specialized, data-sensitive scenarios where big models are not present or applicable.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are the many challenges in building high - performance models in the traditional machine - learning paradigm, specifically including: 1. **Lack of high - quality data**: Training a high - performance machine - learning model usually requires a large amount of high - quality data, which is difficult to obtain in many cases. 2. **Lack of professional skills**: Building and optimizing machine - learning models requires rich professional knowledge and experience, which is a huge obstacle for many users. 3. **Catastrophic forgetting**: When making incremental improvements to an already - trained model, it may cause the model to forget the knowledge it has previously learned. 4. **Difficulty in achieving continuous learning**: Existing models have difficulty adapting to the ever - changing task environment. 5. **Data privacy and ownership issues**: It is difficult for developers to share data because this involves privacy and intellectual property issues. 6. **Unforeseen new tasks**: New tasks in the real world are often unpredictable, and existing large - scale models have difficulty dealing with these specific scenarios. 7. **Carbon emission issues**: Repeatedly training models will lead to a large amount of resource waste and carbon emissions. To solve these problems, the paper proposes the Beimingwu system based on the "Learnware" (learning - ware) paradigm. This system aims to solve new user tasks by uniformly managing and reusing high - performance models submitted by global developers, without having to build models from scratch. Specifically, the Beimingwu system can: - **Simplify the model development process**: Even if users do not have a large amount of data and professional skills, they can quickly build and deploy high - performance models. - **Provide an integrated and extensible architecture design**: Support the management of the entire model life cycle, including submission, testing, organization, identification, deployment, and reuse. - **Protect the privacy of the original data**: By generating statistical specifications (such as RKME), users can submit task requirements without revealing the original data. - **Support diverse learning tasks**: Whether it is a planned task or an unforeseen specific scenario, Beimingwu can provide effective solutions. In this way, the Beimingwu system not only simplifies the development process of machine - learning models, but also solves multiple key problems in the existing paradigm, providing a solid foundation for future scientific research and applications.

Beimingwu: A Learnware Dock System

Towards Enabling Learnware to Handle Unseen Jobs.

Learnware: Small Models Do Big

Towards Enabling Learnware to Handle Heterogeneous Feature Spaces

Towards Making Learnware Specification and Market Evolvable

Identifying Useful Learnwares for Heterogeneous Label Spaces.

Handling Learnwares Developed from Heterogeneous Feature Spaces Without Auxiliary Data

Identifying Helpful Learnwares Without Examining the Whole Market

A Virtual Online Simulator Design for the Docking of Unmanned Underwater Vehicle

Deep Learning Model of Dock by Dock Process Significantly Accelerate the Process of Docking-Based Virtual Screening

Xiwu: A Basis Flexible and Learnable LLM for High Energy Physics

Virtual-reality-based Online Simulator Design with a Virtual Simulation System for the Docking of Unmanned Underwater Vehicle

Safe Distillation Box

Re-Dock: Towards Flexible and Realistic Molecular Docking with Diffusion Bridge

Bench-CoE: a Framework for Collaboration of Experts from Benchmark

Model Reuse with Reduced Kernel Mean Embedding Specification

Neural Probabilistic Protein-Protein Docking Via a Differentiable Energy Model

ApoDock: Ligand-Conditioned Sidechain Packing for Flexible Molecular Docking

DeltaDock: A Unified Framework for Accurate, Efficient, and Physically Reliable Molecular Docking

Pre-Training on Large-Scale Generated Docking Conformations with HelixDock to Unlock the Potential of Protein-ligand Structure Prediction Models

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism