Abstract:In 6G, Channel State Information (CSI) as a versatile resource, can be used to perform multiple tasks with the help of Machine Learning (ML). However, most studies focus on environment- and task-specific ML models, which collectively result in significant system costs. This paper aims to investigate the generalization capability of ML models across diverse Base Station (BS) environments and different CSI-based tasks. Inspired by the powerful cross-modal comprehension and generalization capabilities of image-text multi-modal models, we propose a novel two-stage multi-modal pre-training and downstream task adaptation paradigm to enable the cost-effective execution of multiple CSI-based downstream tasks across multiple BSs. In the pre-training stage, a multi-modal universal model is employed to align CSIs with associated environment descriptions characterized by BS positions, User Equipment (UE) positions and Line-Of-Sight (LOS)/Non-Line-Of-Sight (NLOS) status in the same embedding space through contrastive learning to achieve cross-BS capability and extract task-agnostic CSI representations. In the downstream task adaptation stage, the frozen pre-trained model can directly address the classification task through Zero-Shot Learning (ZSL), or be cascaded to a lightweight network to transform universal CSI representations into task-affine features through Task-Oriented Fine-Tuning (TOFT). Compared with BS-and task-specific methods, our paradigm achieves comparable accuracy while utilizing fewer than 0.257% tuning parameters for LOS/NLOS identification and single/multi-BS positioning tasks in unseen BS environments.

6G-Oriented CSI-Based Multi-Modal Pre-Ttaining and Downstream Task Adaptation Paradigm