Privacy-Preserving Data Linkage Across Private and Public Datasets for Collaborative Agriculture Research

Osama Zafar,Rosemarie Santa Gonzalez,Gabriel Wilkins,Alfonso Morales,Erman Ayday
2024-09-10
Abstract:Digital agriculture leverages technology to enhance crop yield, disease resilience, and soil health, playing a critical role in agricultural research. However, it raises privacy concerns such as adverse pricing, price discrimination, higher insurance costs, and manipulation of resources, deterring farm operators from sharing data due to potential misuse. This study introduces a privacy-preserving framework that addresses these risks while allowing secure data sharing for digital agriculture. Our framework enables comprehensive data analysis while protecting privacy. It allows stakeholders to harness research-driven policies that link public and private datasets. The proposed algorithm achieves this by: (1) identifying similar farmers based on private datasets, (2) providing aggregate information like time and location, (3) determining trends in price and product availability, and (4) correlating trends with public policy data, such as food insecurity statistics. We validate the framework with real-world Farmer's Market datasets, demonstrating its efficacy through machine learning models trained on linked privacy-preserved data. The results support policymakers and researchers in addressing food insecurity and pricing issues. This work significantly contributes to digital agriculture by providing a secure method for integrating and analyzing data, driving advancements in agricultural technology and development.
Machine Learning,Artificial Intelligence,Computers and Society
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the privacy protection issue in data sharing in digital agriculture. Specifically, when farmers share data regarding their field environments and soil conditions, they face privacy risks, such as unfavorable pricing, price discrimination, increased insurance costs, and manipulation of important agricultural resources. These risks lead to farmers' reluctance to share data with external researchers and institutions, thus limiting the development of agricultural research. For this reason, the paper proposes a privacy - protection framework aimed at mitigating privacy risks while making comprehensive data analysis possible. This framework allows the agricultural community to benefit from research - based policy - making and promotes agricultural development by establishing meaningful connections between public and private digital agriculture data sets. The core technologies of this framework include: - **Principal Component Analysis (PCA)**: used for data dimension reduction, reducing the data dimension while retaining the main information. - **Local Differential Privacy (LDP)**: protects individual data by adding noise locally, ensuring privacy even if the data collector is unreliable. - **K - means Clustering Algorithm**: used to identify farmers with specific attributes and group them for further study. Through these technologies, this framework not only protects farmers' data privacy but also enhances the practicality of aggregated data, supporting a wider range of AI - driven agricultural research. In addition, the paper also verifies the effectiveness and practicality of the framework through actual cases, showing how to train various machine - learning models using linked private and public data without disclosing personal privacy, providing support for policy - makers and researchers.