Abstract:Many studies integrate federated learning (FL) with self-supervised learning (SSL) to take advantage of raw data distributed across edge devices. However, edge devices often struggle with high computation and communication costs imposed by SSL and FL algorithms. To tackle this hindrance, we propose LW-FedSSL, a layer-wise federated self-supervised learning approach that allows edge devices to incrementally train a single layer of the model at a time. We introduce server-side calibration and representation alignment mechanisms to ensure LW-FedSSL delivers performance on par with conventional federated self-supervised learning (FedSSL) while significantly lowering resource demands. In a pure layer-wise training scheme, training one layer at a time may limit effective interaction between different layers of the model. The server-side calibration mechanism takes advantage of the resource-rich FL server to ensure smooth collaboration between different layers of the global model. During local training, the representation alignment mechanism encourages closeness between representations of local models and those of the global model, thereby preserving the layer cohesion established by server-side calibration. With the proposed mechanisms, LW-FedSSL achieves a $3.3 \times$ reduction in memory usage, $2.1 \times$ fewer computational operations (FLOPs), and a $3.2 \times$ lower communication cost while maintaining the same level of performance as its end-to-end training counterpart. Additionally, we explore a progressive training strategy called Prog-FedSSL, which matches end-to-end training in memory requirements but offers a $1.8 \times$ reduction in FLOPs and communication costs. Although Prog-FedSSL is not as resource-efficient as LW-FedSSL, its performance improvements make it a suitable candidate for FL environments with more lenient resource constraints.

Exploring Federated Self-Supervised Learning for General Purpose Audio Understanding

Federated Representation Learning for Automatic Speech Recognition

On‐the‐Job Search and the Wage Distribution

Federated Learning With Highly Imbalanced Audio Data

A Hybrid Self-Supervised Learning Framework for Vertical Federated Learning

Semi-Supervised Federated Learning for Keyword Spotting

FedSC: Provable Federated Self-supervised Learning with Spectral Contrastive Objective over Non-i.i.d. Data

Improving Self-Supervised Learning for Audio Representations by Feature Diversity and Decorrelation

Investigating Self-Supervised Learning for Speech Enhancement and Separation

Federated Learning for Audio Semantic Communication

A Generalized Look at Federated Learning: Survey and Perspectives

Federated Semi-Supervised Learning with Annotation Heterogeneity

Leveraging Self-supervised Audio Representations for Data-Efficient Acoustic Scene Classification

Divergence-aware Federated Self-Supervised Learning

Augmented Contrastive Self-Supervised Learning for Audio Invariant Representations

Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition

LW-FedSSL: Resource-efficient Layer-wise Federated Self-supervised Learning

Federated Large Language Models: Current Progress and Future Directions

OTF: Optimal Transport based Fusion of Supervised and Self-Supervised Learning Models for Automatic Speech Recognition

Fed-QSSL: A Framework for Personalized Federated Learning under Bitwidth and Data Heterogeneity

Universal Sound Separation with Self-Supervised Audio Masked Autoencoder