Efficient Semantic Communication Through Transformer-Aided Compression

Matin Mortaheb,Mohammad A. Amir Khojastepour,Sennur Ulukus
2024-12-03
Abstract:Transformers, known for their attention mechanisms, have proven highly effective in focusing on critical elements within complex data. This feature can effectively be used to address the time-varying channels in wireless communication systems. In this work, we introduce a channel-aware adaptive framework for semantic communication, where different regions of the image are encoded and compressed based on their semantic content. By employing vision transformers, we interpret the attention mask as a measure of the semantic contents of the patches and dynamically categorize the patches to be compressed at various rates as a function of the instantaneous channel bandwidth. Our method enhances communication efficiency by adapting the encoding resolution to the content's relevance, ensuring that even in highly constrained environments, critical information is preserved. We evaluate the proposed adaptive transmission framework using the TinyImageNet dataset, measuring both reconstruction quality and accuracy. The results demonstrate that our approach maintains high semantic fidelity while optimizing bandwidth, providing an effective solution for transmitting multi-resolution data in limited bandwidth conditions.
Machine Learning,Computer Vision and Pattern Recognition,Information Theory,Signal Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to efficiently transmit image data under the condition of limited bandwidth while maintaining the key semantic information of the image. Specifically, the paper proposes an adaptive multi - resolution semantic communication framework based on Transformer, aiming to optimize bandwidth usage by dynamically adjusting the encoding resolution of different image regions to ensure that key information can also be retained in a highly constrained environment. This method is particularly suitable for advanced immersive applications in 6G communication systems, such as holographic telepresence and tactile / somatosensory communication. These applications not only require accurate data transmission but also need to ensure that the transmitted content can achieve specific goals in real - time, usually under strict bandwidth and latency limitations. The main contribution of the paper lies in the development of an end - to - end goal - oriented communication system. This system uses Transformer to evaluate the semantic importance of different regions of the image and encodes these regions according to specific task requirements. The system generates multi - level quantized attention masks and encodes image data at different resolutions according to the real - time available channel rate, thereby ensuring that the reconstructed image parts at the receiving end match their semantic importance. Experimental results show that under different channel conditions, this multi - resolution framework significantly improves performance, especially in terms of task accuracy, and has a significant improvement compared with traditional methods.