Collaborative Multimodal Vehicular Transformer Training Using Federated Learning

Xingjian Cao,Zonghang Li,Gang Sun,Hongfang Yu
DOI: https://doi.org/10.1109/vtc2024-spring62846.2024.10683376
2024-01-01
Abstract:The Internet of Vehicles (IoV) is an intricate ecosystem brimming with diverse data modalities, including visual streams from cameras, GPS-based location information, sensor-derived operational metrics, and auditory commands from users. These necessitate advanced multimodal learning capabilities. The Transformer architecture, a significant innovation in artificial intelligence, has demonstrated its proficiency in representing varied modalities, facilitating multimodal machine learning. However, its direct application within the IoV is hampered by concerns over user data privacy. Federated learning (FL), a distributed learning paradigm, offers a solution that upholds data privacy. We propose a novel multimodal Transformer-based federated learning framework that capitalizes on the Transformer's ability to effectively handle multimodal data, enabling collaborative learning across heterogeneous data sources. This framework aligns with stringent privacy regulations, enhancing the protection of user data privacy while also boosting learning efficiency. Our approach surpasses traditional models in both accuracy and efficiency, presenting a significant advance in the multimodal machine learning domain.
What problem does this paper attempt to address?