Abstract:In this paper, we study the problem of PAC learning halfspaces in the non-interactive local differential privacy model (NLDP). To breach the barrier of exponential sample complexity, previous results studied a relaxed setting where the server has access to some additional public but unlabeled data. We continue in this direction. Specifically, we consider the problem under the standard setting instead of the large margin setting studied before. Under different mild assumptions on the underlying data distribution, we propose two approaches that are based on the Massart noise model and self-supervised learning and show that it is possible to achieve sample complexities that are only linear in the dimension and polynomial in other terms for both private and public data, which significantly improve the previous results. Our methods could also be used for other private PAC learning problems.

What problem does this paper attempt to address?

### The problems the paper attempts to solve This paper aims to study how to effectively perform PAC learning of half - spaces using publicly unlabeled data in the non - interactive local differential privacy model (NLDP). Specifically, the authors hope to overcome the obstacle of exponential - level sample complexity and propose methods that can significantly outperform previous results in terms of sample complexity. #### Background and motivation 1. **Privacy protection requirements**: With the large - scale generation and collection of sensitive data, how to use these data for analysis without exposing personal privacy has become an important issue. For this reason, differential privacy (DP) has become a de - facto privacy protection tool. 2. **Existing challenges**: In the NLDP model, due to the limitation of the number of communication rounds, the theoretical behavior is more challenging than other models. In particular, Daniely and Feldman (2019) proved that even under the large - margin assumption, learning half - spaces requires exponential - level sample complexity. To solve this problem, Daniely and Feldman introduced a relaxed NLDP model in which the server can access some publicly but unlabeled data. 3. **Improvement goals**: This paper attempts to further reduce the sample complexity under the standard setting (rather than the large - margin setting), especially when using publicly unlabeled data, so that the sample complexity depends linearly on the dimension and other polynomial terms. #### Main contributions 1. **Anti - anti - concentration property**: The authors first studied the situation where the data distribution satisfies the anti - anti - concentration and anti - concentration properties, and proposed an (ε, δ)-NLDP algorithm based on the Massart noise model, achieving linear sample complexity. 2. **Self - supervised learning**: To further reduce the sample complexity of public data, the authors studied the self - supervised learning method in the case of mixed distributions and proposed an algorithm that can achieve O(d/α²) sample complexity. ### Formula summary - **Sample complexity**: - Private data: \(\tilde{O}(d\cdot\text{Poly}(1/\epsilon, 1/\alpha))\) - Publicly unlabeled data: \(O(d/\alpha^4)\) or \(O(d/\alpha^2)\) - **Massart noise model**: - The probability that each sample label is flipped does not exceed λ < 1/2, that is: \[ y = \begin{cases} f(x), & \text{with probability } 1 - \lambda(x)\\ - f(x), & \text{with probability } \lambda(x) \end{cases} \] where \(\lambda(x)\leq\lambda\). - **Anti - anti - concentration property**: - For any probability density function γ_V projected onto a 2 - dimensional subspace V, it satisfies: \[ \gamma_V(x)\leq U\quad\forall x\in V \] and for all points with \(\|x\|_2\leq r\), it satisfies: \[ \gamma_V(x)\geq\frac{1}{U} \] Through these improvements, this paper greatly improves the efficiency and accuracy of PAC learning of half - spaces while ensuring privacy.

On PAC Learning Halfspaces in Non-interactive Local Privacy Model with Public Unlabeled Data

Efficient PAC Learning of Halfspaces with Constant Malicious Noise Rate

Privacy Preserving PCA for Multiparty Modeling

Private PAC Learning May be Harder than Online Learning

Reliable Learning of Halfspaces under Gaussian Marginals

Non-Convex SGD Learns Halfspaces with Adversarial Label Noise

PILLAR: How to make semi-private learning more effective

Efficient Active Learning Halfspaces with Tsybakov Noise: A Non-convex Optimization Approach

Enhancing PAC Learning of Half spaces Through Robust Optimization Techniques

Limits of Private Learning with Access to Public Data

Learning Privately with Labeled and Unlabeled Examples

A New Noise Generating Method Based on Gaussian Sampling for Privacy Preservation

DPMLBench: Holistic Evaluation of Differentially Private Machine Learning

PAC Privacy Preserving Diffusion Models

Securely Sampling Discrete Gaussian Noise for Multi-Party Differential Privacy

What Can We Learn Privately?

Private Linear Regression with Differential Privacy and PAC Privacy

Public-data Assisted Private Stochastic Optimization: Power and Limitations

Collect at Once, Use Effectively: Making Non-interactive Locally Private Learning Possible

Contrastive Moments: Unsupervised Halfspace Learning in Polynomial Time

Online Learning of Halfspaces with Massart Noise