Abstract:In emergencies, the ability to quickly and accurately gather environmental data and command information, and to make timely decisions, is particularly critical. Traditional semantic communication frameworks, primarily based on a single modality, are susceptible to complex environments and lighting conditions, thereby limiting decision accuracy. To this end, this paper introduces a multimodal generative semantic communication framework named mm-GESCO. The framework ingests streams of visible and infrared modal image data, generates fused semantic segmentation maps, and transmits them using a combination of one-hot encoding and zlib compression techniques to enhance data transmission efficiency. At the receiving end, the framework can reconstruct the original multimodal images based on the semantic maps. Additionally, a latent diffusion model based on contrastive learning is designed to align different modal data within the latent space, allowing mm-GESCO to reconstruct latent features of any modality presented at the input. Experimental results demonstrate that mm-GESCO achieves a compression ratio of up to 200 times, surpassing the performance of existing semantic communication frameworks and exhibiting excellent performance in downstream tasks such as object classification and detection.

What problem does this paper attempt to address?

The paper attempts to address the issue of efficiently transmitting multimodal images (specifically visible light and infrared images) and reconstructing them through a semantic communication framework in emergency situations. Specifically, the paper proposes a multimodal generative semantic communication framework named mm-GESCO, which aims to solve the following key problems: 1. **Data Compression and Transmission Efficiency**: During disasters, infrastructure damage leads to the lack of public network support, and drones must rely on temporarily deployed dedicated networks to transmit data. In such cases, traditional single-modal semantic communication frameworks are susceptible to complex environments and lighting conditions, limiting the accuracy of decision-making. Therefore, the paper proposes a method that generates semantic segmentation maps by fusing visible light and infrared images and combines one-hot encoding and zlib compression technology to achieve a data compression rate of up to 200 times, thereby improving data transmission efficiency. 2. **Multimodal Data Reconstruction**: Existing research mainly focuses on the reconstruction of single-modal data, which has limitations in multitasking processing. The mm-GESCO framework utilizes a latent diffusion model, combined with contrastive learning methods, to align data of different modalities in the latent space, enabling a single model to reconstruct various modalities of data based on the input modal information, reducing deployment costs in emergency situations. 3. **Downstream Task Performance**: Experimental results show that mm-GESCO performs excellently in downstream tasks such as object classification and detection, surpassing existing single-modal or multimodal semantic communication frameworks. In summary, this paper aims to improve data transmission efficiency and multimodal data reconstruction capabilities in emergency situations through an innovative approach, supporting more efficient search and rescue missions.

Multimodal generative semantic communication based on latent diffusion model

Image Generation with Multimodule Semantic Feature-Aided Selection for Semantic Communications

Generative Semantic Communication: Diffusion Models Beyond Bit Recovery

Semantic Change Driven Generative Semantic Communication Framework

DMCE: Diffusion Model Channel Enhancer for Multi-User Semantic Communication Systems

Diffusion-based Generative Multicasting with Intent-aware Semantic Decomposition

Lightweight Diffusion Models for Resource-Constrained Semantic Communication

Rethinking Generative Semantic Communication for Multi-User Systems with Multi-Modal LLM

A Mamba-Diffusion Framework for Multimodal Remote Sensing Image Semantic Segmentation

MultiSenseSeg: A Cost-Effective Unified Multimodal Semantic Segmentation Model for Remote Sensing

Large Generative Model Assisted 3D Semantic Communication

Semantic Successive Refinement: A Generative AI-aided Semantic Communication Framework

Image Generative Semantic Communication with Multi-Modal Similarity Estimation for Resource-Limited Networks

Rate-Adaptive Generative Semantic Communication Using Conditional Diffusion Models

Latency-Aware Generative Semantic Communications with Pre-Trained Diffusion Models

Agent-driven Generative Semantic Communication with Cross-Modality and Prediction

Rethinking Multi-User Semantic Communications with Deep Generative Models

CASC: Condition-Aware Semantic Communication with Latent Diffusion Models

Language-oriented Semantic Communication for Image Transmission with Fine-Tuned Diffusion Model

SC-CDM: Enhancing Quality of Image Semantic Communication with a Compact Diffusion Model

Generative Semantic Communication via Textual Prompts: Latency Performance Tradeoffs