Abstract:Economics and social science research often require analyzing datasets of sensitive personal information at fine granularity, with models fit to small subsets of the data. Unfortunately, such fine-grained analysis can easily reveal sensitive individual information. We study algorithms for simple linear regression that satisfy differential privacy, a constraint which guarantees that an algorithm's output reveals little about any individual input data record, even to an attacker with arbitrary side information about the dataset. We consider the design of differentially private algorithms for simple linear regression for small datasets, with tens to hundreds of datapoints, which is a particularly challenging regime for differential privacy. Focusing on a particular application to small-area analysis in economics research, we study the performance of a spectrum of algorithms we adapt to the setting. We identify key factors that affect their performance, showing through a range of experiments that algorithms based on robust estimators (in particular, the Theil-Sen estimator) perform well on the smallest datasets, but that other more standard algorithms do better as the dataset size increases.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper primarily explores how to conduct simple linear regression analysis while protecting individual privacy. Specifically, the focus of the research is on designing Differential Privacy (DP) algorithms for simple linear regression when dealing with small datasets containing sensitive information. #### Main Issues 1. **Privacy Protection on Small Datasets**: Economic and social science research often requires fine-grained analysis of datasets containing sensitive personal information. However, such analysis may reveal individual information. Therefore, maintaining the validity of statistical estimates while protecting privacy becomes a key issue. 2. **Simple Linear Regression under Differential Privacy**: The paper investigates how to design simple linear regression algorithms that satisfy differential privacy constraints to ensure that the algorithm's output does not reveal any specific input data records. 3. **Challenges of Small Datasets**: For cases with small amounts of data (dozens to hundreds of data points), designing effective differential privacy algorithms is particularly challenging. #### Specific Goals - Provide a differential privacy algorithm such that when performing simple linear regression on small datasets, the added noise does not significantly increase uncertainty. - Validate the performance of different algorithms under various parameter settings through experiments and find the most suitable method for practical applications. - Pay special attention to the "Opportunity Atlas" tool in economics, which is used to study the relationship between children's growth environments and their economic mobility. Since the datasets are usually small (100 to 400 data points), effective differential privacy algorithms are needed to protect this data. #### Methods and Results - Several differential privacy algorithms based on robust estimators (such as the Theil-Sen estimator) were studied and compared with other standard methods. - It was found that algorithms based on the Theil-Sen estimator performed best on the smallest datasets, but as the dataset size increased, other standard algorithms performed better. - A series of experiments demonstrated that, under a wide range of real-world datasets and moderate privacy parameter values, a differential privacy linear regression algorithm could be found with an error smaller than the standard error. ### Conclusion The paper proposes a new differential privacy algorithm, DPExpTheilSen, which performs optimally in various scenarios. Additionally, the paper discusses the applicability of different algorithms under different dataset attributes, providing valuable insights for further research.

Differentially Private Simple Linear Regression

Analyzing the Differentially Private Theil-Sen Estimator for Simple Linear Regression

Differentially Private Regression with Unbounded Covariates

Better Private Linear Regression Through Better Private Feature Selection

Differentially Private Linear Regression Analysis via Truncating Technique

Differentially Private Model Selection with Penalized and Constrained Likelihood

A Survey of Differentially Private Regression for Clinical and Epidemiological Research

Private Linear Regression with Differential Privacy and PAC Privacy

Differentially Private Linear Regression over Fully Decentralized Datasets

Revisiting differentially private linear regression: optimal and adaptive prediction & estimation in unbounded domain

Differentially Private Learning Beyond the Classical Dimensionality Regime

Differentially Private Algorithms for Empirical Machine Learning

Privacy-Preserving Algorithms for Machine Learning

Differentially Private Generalized Linear Models Revisited

Median Regression with Differential Privacy

The Cost of Privacy: Optimal Rates of Convergence for Parameter Estimation with Differential Privacy

Identification and Formal Privacy Guarantees

Efficient Sparse Least Absolute Deviation Regression with Differential Privacy

Differential Privacy: An Economic Method for Choosing Epsilon

Differentially Private Learning with Small Public Data.

Differentially Private Model Personalization