Abstract:High-dimensional data visualization is crucial in the big data era and these techniques such as t-SNE and UMAP have been widely used in science and engineering. Big data, however, is often distributed across multiple data centers and subject to security and privacy concerns, which leads to difficulties for the standard algorithms of t-SNE and UMAP. To tackle the challenge, this work proposes Fed-tSNE and Fed-UMAP, which provide high-dimensional data visualization under the framework of federated learning, without exchanging data across clients or sending data to the central server. The main idea of Fed-tSNE and Fed-UMAP is implicitly learning the distribution information of data in a manner of federated learning and then estimating the global distance matrix for t-SNE and UMAP. To further enhance the protection of data privacy, we propose Fed-tSNE+ and Fed-UMAP+. We also extend our idea to federated spectral clustering, yielding algorithms of clustering distributed data. In addition to these new algorithms, we offer theoretical guarantees of optimization convergence, distance and similarity estimation, and differential privacy. Experiments on multiple datasets demonstrate that, compared to the original algorithms, the accuracy drops of our federated algorithms are tiny.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to visualize distributed high - dimensional data in the federated learning framework without exchanging data or sending data to a central server. Specifically, the traditional t - SNE and UMAP algorithms face two main challenges when dealing with large - scale distributed data: 1. **Data distribution problem**: In many practical applications (such as mobile devices, Internet of Things networks, medical records, and social media platforms), high - dimensional data are usually distributed in multiple data centers and are restricted by security and privacy. This makes it necessary for different data centers or clients to share data or send it to a common central server, resulting in data privacy leakage and information security risks. 2. **Computational complexity problem**: The t - SNE and UMAP algorithms need to calculate the pairwise distances or similarities between all data points, which means that the data between different data sources must be shared with each other or uploaded to the central server for global calculation. This requirement not only increases the communication cost but may also be difficult to achieve due to the huge amount of data. To solve these problems, the paper proposes the following methods: - **Fed - tSNE and Fed - UMAP**: Implicitly learn the distribution information of data through the federated learning method and estimate the global distance matrix, thereby achieving distributed visualization of high - dimensional data. - **Fed - tSNE+ and Fed - UMAP+**: Further enhance privacy protection by introducing noise to protect data privacy. - **Extension to federated spectral clustering**: Apply the above ideas to the clustering tasks of distributed data, while providing theoretical guarantees, including optimization convergence, distance and similarity estimation, and differential privacy analysis. Through these methods, the paper aims to achieve a technical solution for efficient visualization of distributed high - dimensional data without violating data privacy.

Federated t-SNE and UMAP for Distributed Data Visualization

Federated Visualization: A Privacy-Preserving Strategy for Aggregated Visual Query

Federated Matrix Factorization: Algorithm Design and Application to Data Clustering

Optimizing Federated Learning on Non-IID Data Using Local Shapley Value.

FedEmb: A Vertical and Hybrid Federated Learning Algorithm using Network And Feature Embedding Aggregation

Disentangling data distribution for Federated Learning

SMAP: A Joint Dimensionality Reduction Scheme for Secure Multi-Party Visualization

Federated Clustering: An Unsupervised Cluster-Wise Training for Decentralized Data Distributions

Fast and Privacy-Preserving Federated Joint Estimator of Multi-sUGMs

A federated fuzzy c-means clustering algorithm

Federated Transfer Learning with Differential Privacy

Graph Federated Learning Based on the Decentralized Framework

From distributed machine learning to federated learning: In the view of data privacy and security

Communication-Efficient and Privacy-Preserving Large-Scale Federated Learning Counteracting Heterogeneity

An Analysis of the t-SNE Algorithm for Data Visualization

Federated Learning with Data-Agnostic Distribution Fusion

Distributed Modelling Approaches for Data Privacy Preserving

Deep Federated Anomaly Detection for Multivariate Time Series Data

GraphFederator: Federated Visual Analysis for Multi-party Graphs

Tackling Data Heterogeneity in Federated Time Series Forecasting

Federated Two Stage Decoupling With Adaptive Personalization Layers