Abstract:Edge Intelligence (EI) allows Artificial Intelligence (AI) applications to run at the edge, where data analysis and decision-making can be performed in real-time and close to data sources. To protect data privacy and unify data silos distributed among end devices in EI, Federated Learning (FL) is proposed for collaborative training of shared AI models across multiple devices without compromising data privacy. However, the prevailing FL approaches cannot guarantee model generalization and adaptation on heterogeneous clients. Recently, Personalized Federated Learning (PFL) has drawn growing awareness in EI, as it enables a productive balance between local-specific training requirements inherent in devices and global-generalized optimization objectives for satisfactory performance. However, most existing PFL methods are based on the Parameters Interaction-based Architecture (PIA) represented by FedAvg, which suffers from unaffordable communication burdens due to large-scale parameters transmission between devices and the edge server. In contrast, Logits Interaction-based Architecture (LIA) allows to update model parameters with logits transfer and gains the advantages of communication lightweight and heterogeneous on-device model allowance compared to PIA. Nevertheless, previous LIA methods attempt to achieve satisfactory performance either relying on unrealistic public datasets or increasing communication overhead for additional information transmission other than logits. To tackle this dilemma, we propose a knowledge cache-driven PFL architecture, named FedCache, which reserves a knowledge cache on the server for fetching personalized knowledge from the samples with similar hashes to each given on-device sample. During the training phase, ensemble distillation is applied to on-device models for constructive optimization with personalized knowledge transferred from the server-side knowledge cache. Empirical experiments on four datasets demonstrate that FedCache achieves comparable performance with state-of-art PFL approaches, with more than two orders of magnitude improvements in communication efficiency. Our code and DEMO are available at https://github.com/wuzhiyuan2000/FedCache.

Fleche: an efficient GPU embedding cache for personalized recommendations

Mixed-Precision Embedding Using a Cache

Put an Elephant into a Fridge

Optimizing Inference Quality with SmartNIC for Recommendation System

Accelerating Recommendation System Training by Leveraging Popular Choices

Pushing the Performance Envelope of DNN-based Recommendation Systems Inference on GPUs

MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions

NDRec: A Near-Data Processing System for Training Large-Scale Recommendation Models

Applying Deep Learning to the Cache Replacement Problem

Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware Multifaceted Optimizations

Accelerating Deep Learning Inference with Cross-Layer Data Reuse on GPUs

TensorCache: Reconstructing Memory Architecture with SRAM-Based In-Cache Computing for Efficient Tensor Computations in GPGPUs

RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing

Adaptive Kernel Merge and Fusion for Multi-Tenant Inference in Embedded GPUs

Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations

FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality

A Long-Short-Term Fusion Approach for Video Cache.

DeepCache: Principled Cache for Mobile Deep Vision.

Advanced hybrid MRAM based novel GPU cache system for graphic processing with high efficiency

PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference

FedCache: A Knowledge Cache-Driven Federated Learning Architecture for Personalized Edge Intelligence