FedDMF: Privacy-Preserving User Attribute Prediction using Deep Matrix Factorization

Ming Cheung
DOI: https://doi.org/10.48550/arXiv.2312.15420
2023-12-24
Abstract:User attribute prediction is a crucial task in various industries. However, sharing user data across different organizations faces challenges due to privacy concerns and legal requirements regarding personally identifiable information. Regulations such as the General Data Protection Regulation (GDPR) in the European Union and the Personal Information Protection Law of the People's Republic of China impose restrictions on data sharing. To address the need for utilizing features from multiple clients while adhering to legal requirements, federated learning algorithms have been proposed. These algorithms aim to predict user attributes without directly sharing the data. However, existing approaches typically rely on matching users across companies, which can result in dishonest partners discovering user lists or the inability to utilize all available features. In this paper, we propose a novel algorithm for predicting user attributes without requiring user matching. Our approach involves training deep matrix factorization models on different clients and sharing only the item vectors. This allows us to predict user attributes without sharing the user vectors themselves. The algorithm is evaluated using the publicly available MovieLens dataset and demonstrate that it achieves similar performance to the FedAvg algorithm, reaching 96% of a single model's accuracy. The proposed algorithm is particularly well-suited for improving customer targeting and enhancing the overall customer experience. This paper presents a valuable contribution to the field of user attribute prediction by offering a novel algorithm that addresses some of the most pressing privacy concerns in this area.
Machine Learning,Cryptography and Security
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the privacy protection issue in user attribute prediction. Specifically, the sharing of user data among different organizations faces many challenges, mainly due to privacy concerns and strict restrictions on personally identifiable information (PII) by laws and regulations (such as the General Data Protection Regulation (GDPR) in the European Union and the Personal Information Protection Law (PIPL) in China). ### Problem Background 1. **Privacy and Legal Requirements**: When sharing user data across organizations, strict privacy regulations must be adhered to, which makes it very difficult to directly share user data. 2. **Limitations of Existing Methods**: - Although existing federated learning algorithms can train models without directly sharing data, they usually need to match users among different companies, which may lead to privacy leakage or the inability to fully utilize all available features. - Horizontal Federated Learning (HFL) and Vertical Federated Learning (VFL) both have their own limitations. HFL can only utilize common features, while VFL requires user matching and has privacy risks. ### The Method Proposed in the Paper To solve the above - mentioned problems, this paper proposes a new algorithm - FedDMF (Federated Deep Matrix Factorization), which has the following characteristics: - **No Need for User Matching**: By training the Deep Matrix Factorization (DMF) model, only item vectors are shared among clients, not user vectors, thus avoiding the need for user matching. - **Fully Utilize All Features**: Each client can independently train its model and share item vectors, ensuring that all features can be utilized without worrying about privacy leakage. - **Privacy Protection**: In this way, user attribute prediction can be carried out without exposing user identities, thus effectively protecting user privacy. ### Experimental Verification To verify the effectiveness of FedDMF, researchers used the public MovieLens dataset for experiments. The results show that FedDMF is comparable in performance to the traditional FedAvg algorithm, achieving 96% accuracy of a single model. This indicates that FedDMF can not only effectively protect privacy but also maintain high prediction accuracy. ### Application Prospects This algorithm is especially suitable for the retail industry and can improve customer targeting and enhance the overall customer experience. In addition, it provides a novel and effective solution to solve the privacy problem in user attribute prediction. In summary, the main contribution of this paper is to propose a new algorithm FedDMF, which can effectively predict user attributes without violating privacy regulations while fully utilizing the data features of multiple clients.