Abstract:Many large vision models have been deployed on the cloud for real-time services. Meanwhile, fresh samples are continuously generated on the served mobile device. How to leverage the device-side samples to improve the cloud-side large model becomes a practical requirement, but falls into the dilemma of no raw sample up-link and no large model down-link. Specifically, the user may opt out of sharing raw samples with the cloud due to the concern of privacy or communication overhead, while the size of some large vision models far exceeds the mobile device's runtime capacity. In this work, we propose a device-cloud collaborative controlled learning framework, called DC-CCL, enabling a cloud-side large vision model that cannot be directly deployed on the mobile device to still benefit from the device-side local samples. In particular, DC-CCL vertically splits the base model into two submodels, one large submodel for learning from the cloud-side samples and the other small submodel for learning from the device-side samples and performing device-cloud knowledge fusion. Nevertheless, on-device training of the small submodel requires the output of the cloud-side large submodel to compute the desired gradients. DC-CCL thus introduces a light-weight model to mimic the large cloud-side submodel with knowledge distillation, which can be offloaded to the mobile device to control its small submodel's optimization direction. Given the decoupling nature of two submodels in collaborative learning, DC-CCL also allows the cloud to take a pre-trained model and the mobile device to take another model with a different backbone architecture.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: How can large - scale vision models in the cloud learn from new samples on mobile devices and improve their performance without uploading the original samples (to protect user privacy and reduce communication overhead) and without downloading large - scale models (to avoid resource limitations of mobile devices)? ### Problem Background 1. **Large - scale Vision Models Deployed in the Cloud**: More and more deep - vision models are deployed on cloud servers to provide various intelligent services for mobile applications, such as image recognition, live - stream highlight recognition, and video analysis. 2. **New Samples Generated on the Device Side**: When using these mobile applications, mobile devices will continuously generate new samples with user feedback. 3. **Privacy and Communication Efficiency Issues**: - Users may be reluctant to upload the original samples to the cloud out of privacy or communication - overhead considerations. - The number of parameters in large - scale vision models is usually large, exceeding the operating capacity of mobile devices, and thus cannot be directly trained on mobile devices. ### Specific Challenges - **Data Upload Limitations**: Local visual samples on mobile devices contain sensitive information and cannot be uploaded to the cloud to protect user privacy. In addition, the size of visual data is usually large, and uploading will bring high communication costs. - **Model Download Limitations**: The number of parameters in cloud - side vision models is large, exceeding the operating capacity of mobile devices, and thus cannot be directly downloaded to mobile devices for local training. ### Solutions To address the above challenges, the paper proposes a new framework - DC - CCL (Device - Cloud Collaborative Controlled Learning), aiming to achieve the following goals: 1. **No Need to Upload Original Samples**: By introducing a lightweight control model to simulate the output of the cloud - side sub - model, the uploading of the original samples is avoided. 2. **No Need to Download Large - scale Models**: By vertically splitting the base model, the model is divided into a cloud - side sub - model and a device - cloud co - sub - model, and only a lightweight co - sub - model is deployed on the mobile device. 3. **Knowledge Fusion**: Through collaborative learning, the cloud - side sub - model can learn from new samples on the device side and fuse with the knowledge on the device side to improve the performance of the overall model. ### Main Contributions 1. **Propose a New Device - Cloud Collaborative Learning Framework for the First Time**: It enables large - scale cloud - side vision models to learn from new samples on the device side without uploading the original samples and without downloading large - scale models. 2. **Decouple the Dependence of Device - side Learning on the Complete Model**: By vertically splitting the model and introducing a lightweight control model, local optimization is achieved. 3. **Experimental Results Show**: DC - CCL outperforms the baseline methods that only use cloud - side samples or only adopt small device - affordable models on multiple public datasets and different models, with an accuracy improvement of 3.52% to 41.32%. ### Summary The DC - CCL framework, through its innovative design, solves the problem of enabling large - scale cloud - side vision models to learn from new samples on mobile devices and improve performance while protecting user privacy and reducing communication overhead.

DC-CCL: Device-Cloud Collaborative Controlled Learning for Large Vision Models

Collaborative Learning Between Cloud and End Devices

Close the Gap Between Deep Learning and Mobile Intelligence by Incorporating Training in the Loop

Device-Cloud Collaborative Learning for Recommendation

Edge-cloud Collaborative Learning with Federated and Centralized Features

Cloud-Device Collaborative Learning for Multimodal Large Language Models

Cloud-Device Collaborative Adaptation to Continual Changing Environments in the Real-world

ECLM: Efficient Edge-Cloud Collaborative Learning with Continuous Environment Adaptation

On-Device Learning for Model Personalization with Large-Scale Cloud-Coordinated Domain Adaption

A Cloud-Edge Collaboration Framework for Cognitive Service.

MDLdroid: a ChainSGD-reduce Approach to Mobile Deep Learning for Personal Mobile Sensing

Delta: A Cloud-assisted Data Enrichment Framework for On-Device Continual Learning

MEC-DA: Memory-Efficient Collaborative Domain Adaptation for Mobile Edge Devices

Cloud-Edge Collaborative Large Model Services: Challenges and Solutions

FedDCT: Federated Learning of Large Convolutional Neural Networks on Resource Constrained Devices using Divide and Collaborative Training

Dual-Model Distillation for Efficient Action Classification with Hybrid Edge-Cloud Solution

Device-Cloud Collaborative Recommendation via Meta Controller

On-Device Learning with Cloud-Coordinated Data Augmentation for Extreme Model Personalization in Recommender Systems

Toward Decentralized and Collaborative Deep Learning Inference for Intelligent IoT Devices

CrowdLearning: A Decentralized Distributed Training Framework Based on Collectives of Trusted AIoT Devices

Towards Collaborative Intelligence Friendly Architectures for Deep Learning