Adaptive Semantic Token Selection for AI-native Goal-oriented Communications

Alessio Devoto,Simone Petruzzi,Jary Pomponi,Paolo Di Lorenzo,Simone Scardapane

2024-04-25

Abstract:In this paper, we propose a novel design for AI-native goal-oriented communications, exploiting transformer neural networks under dynamic inference constraints on bandwidth and computation. Transformers have become the standard architecture for pretraining large-scale vision and text models, and preliminary results have shown promising performance also in deep joint source-channel coding (JSCC). Here, we consider a dynamic model where communication happens over a channel with variable latency and bandwidth constraints. Leveraging recent works on conditional computation, we exploit the structure of the transformer blocks and the multihead attention operator to design a trainable semantic token selection mechanism that learns to select relevant tokens (e.g., image patches) from the input signal. This is done dynamically, on a per-input basis, with a rate that can be chosen as an additional input by the user. We show that our model improves over state-of-the-art token selection mechanisms, exhibiting high accuracy for a wide range of latency and bandwidth constraints, without the need for deploying multiple architectures tailored to each constraint. Last, but not least, the proposed token selection mechanism helps extract powerful semantics that are easy to understand and explain, paving the way for interpretable-by-design models for the next generation of AI-native communication systems.

Information Theory,Artificial Intelligence,Machine Learning

What problem does this paper attempt to address?

The paper aims to address the issues of bandwidth and computational resource constraints in AI-native goal-oriented communication. Specifically, the authors propose a novel design method that leverages transformer neural networks for goal-oriented communication under dynamic inference constraints. Compared to traditional Joint Source-Channel Coding (JSCC) methods, this approach demonstrates higher accuracy under varying latency and bandwidth constraints without the need to deploy separately customized architectures for each specific constraint. Additionally, the proposed token selection mechanism helps extract robust semantic information that is easy to understand and interpret, thereby providing interpretability for the design of next-generation AI-native communication systems. The authors achieve this by introducing a trainable semantic token selection mechanism that can dynamically select relevant tokens (e.g., image blocks) based on the input signal. This selection can be adjusted according to additional budget parameters provided by the user during inference. This method not only improves communication efficiency but also enhances the model's interpretability, particularly excelling in scenarios with different bandwidth and computational resource limitations. Experimental results show that the proposed method outperforms existing token selection and compression baseline methods in terms of both accuracy and communication efficiency.

Adaptive Semantic Token Selection for AI-native Goal-oriented Communications

Transformer-Aided Semantic Communications

Demo: Real-Time Semantic Communications with a Vision Transformer

Semantic Successive Refinement: A Generative AI-aided Semantic Communication Framework

Semantic Communication with Adaptive Universal Transformer

AT-SNN: Adaptive Tokens for Vision Transformer on Spiking Neural Network

Efficient Semantic Communication Through Transformer-Aided Compression

Cross-Attention is all you need: Real-Time Streaming Transformers for Personalised Speech Enhancement

Temporal Prompt Engineering for Generative Semantic Communication

Generative Semantic Communication for Text-to-Speech Synthesis

SNN-SC: A Spiking Semantic Communication Framework for Collaborative Intelligence

Lookahead When It Matters: Adaptive Non-causal Transformers for Streaming Neural Transducers

Efficient Video Transformers with Spatial-Temporal Token Selection

TokenVerse: Towards Unifying Speech and NLP Tasks via Transducer-based ASR

Transformer-based Joint Source Channel Coding for Textual Semantic Communication

On the Role of ViT and CNN in Semantic Communications: Analysis and Prototype Validation

Deep Learning-Enabled Semantic Communication Systems With Task-Unaware Transmitter and Dynamic Data

Adaptive Wireless Image Semantic Transmission and Over-The-Air Testing

Deep Learning Enabled Task-Oriented Semantic Communication for Memory-Limited Devices

Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction

Accelerating Transducers through Adjacent Token Merging