Adaptive Semantic Token Selection for AI-native Goal-oriented Communications

Alessio Devoto,Simone Petruzzi,Jary Pomponi,Paolo Di Lorenzo,Simone Scardapane
2024-04-25
Abstract:In this paper, we propose a novel design for AI-native goal-oriented communications, exploiting transformer neural networks under dynamic inference constraints on bandwidth and computation. Transformers have become the standard architecture for pretraining large-scale vision and text models, and preliminary results have shown promising performance also in deep joint source-channel coding (JSCC). Here, we consider a dynamic model where communication happens over a channel with variable latency and bandwidth constraints. Leveraging recent works on conditional computation, we exploit the structure of the transformer blocks and the multihead attention operator to design a trainable semantic token selection mechanism that learns to select relevant tokens (e.g., image patches) from the input signal. This is done dynamically, on a per-input basis, with a rate that can be chosen as an additional input by the user. We show that our model improves over state-of-the-art token selection mechanisms, exhibiting high accuracy for a wide range of latency and bandwidth constraints, without the need for deploying multiple architectures tailored to each constraint. Last, but not least, the proposed token selection mechanism helps extract powerful semantics that are easy to understand and explain, paving the way for interpretable-by-design models for the next generation of AI-native communication systems.
Information Theory,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the issues of bandwidth and computational resource constraints in AI-native goal-oriented communication. Specifically, the authors propose a novel design method that leverages transformer neural networks for goal-oriented communication under dynamic inference constraints. Compared to traditional Joint Source-Channel Coding (JSCC) methods, this approach demonstrates higher accuracy under varying latency and bandwidth constraints without the need to deploy separately customized architectures for each specific constraint. Additionally, the proposed token selection mechanism helps extract robust semantic information that is easy to understand and interpret, thereby providing interpretability for the design of next-generation AI-native communication systems. The authors achieve this by introducing a trainable semantic token selection mechanism that can dynamically select relevant tokens (e.g., image blocks) based on the input signal. This selection can be adjusted according to additional budget parameters provided by the user during inference. This method not only improves communication efficiency but also enhances the model's interpretability, particularly excelling in scenarios with different bandwidth and computational resource limitations. Experimental results show that the proposed method outperforms existing token selection and compression baseline methods in terms of both accuracy and communication efficiency.