Abstract:Federated cross-modal retrieval uses decentralized clients to learn a shared cross-modal retrieval model to reduce the high maintenance cost associated with centralized multimodal training data and solve the data privacy problem in cross-modal retrieval in distributed data storage scenarios.However,most existing federated cross-modal retrieval methods rely on many semantic annotations,limiting the scalability of the retrieval model in large-scale applications.In this paper,an unsupervised federated cross-modal Hashing retrieval model is proposed to learn a cross-modal Hashing retrieval model not dependent on semantic annotations under the premise of protecting the privacy of client data.Because of the unbalanced distribution of multimodal data in a federated learning environment,local information is insufficient for the model to learn the inter-modal similarity of the overall data,which affects the retrieval performance.To solve this problem,this paper proposes a global and local intra-modal contrastive regularization,which imposes constraints on the local Hashing model of a single modality with a global Hashing model of a different modality.This ensures that the local Hashing model can fully perceive the overall semantic similarity of data and enhance the supervision of the local cross-modal hash learning process.Moreover,this paper introduces a global-local intra-modal knowledge distillation strategy to further obtain specific global knowledge of the intra-modality.Experimental results on five benchmark cross-modal retrieval datasets demonstrate the effectiveness of the proposed method.

Federated unsupervised cross-modal Hashing