Abstract:Background: Estimation of genetic relatedness, or kinship, is used occasionally for recreational purposes and in forensic applications. While numerous methods were developed to estimate kinship, they suffer from high computational requirements and often make an untenable assumption of homogeneous population ancestry of the samples. Moreover, genetic privacy is generally overlooked in the usage of kinship estimation methods. There can be ethical concerns about finding unknown familial relationships in third-party databases. Similar ethical concerns may arise while estimating and reporting sensitive population-level statistics such as inbreeding coefficients for the concerns around marginalization and stigmatization. Results: Here, we present SIGFRIED, which makes use of existing reference panels with a projection-based approach that simplifies kinship estimation in the admixed populations. We use simulated and real datasets to demonstrate the accuracy and efficiency of kinship estimation. We present a secure federated kinship estimation framework and implement a secure kinship estimator using homomorphic encryption-based primitives for computing relatedness between samples in two different sites while genotype data are kept confidential. Source code and documentation for our methods can be found at https://doi.org/10.5281/zenodo.7053352. Conclusions: Analysis of relatedness is fundamentally important for identifying relatives, in association studies, and for estimation of population-level estimates of inbreeding. As the awareness of individual and group genomic privacy is growing, privacy-preserving methods for the estimation of relatedness are needed. Presented methods alleviate the ethical and privacy concerns in the analysis of relatedness in admixed, historically isolated and underrepresented populations. Short abstract: Genetic relatedness is a central quantity used for finding relatives in databases, correcting biases in genome wide association studies and for estimating population-level statistics. Methods for estimating genetic relatedness have high computational requirements, and occasionally do not consider individuals from admixed ancestries. Furthermore, the ethical concerns around using genetic data and calculating relatedness are not considered. We present a projection-based approach that can efficiently and accurately estimate kinship. We implement our method using encryption-based techniques that provide provable security guarantees to protect genetic data while kinship statistics are computed among multiple sites.

Estimating pairwise relatedness in a small sample of individuals

Correcting model misspecification in relationship estimates

Systematic bias in malaria parasite relatedness estimation

A Generalized Approach for Measuring Relationships among Genes.

Allele frequency-free inference of close familial relationships from genotypes or low-depth sequencing data

Estimates of heterozygosity from single nucleotide polymorphism markers are context‐dependent and often wrong

Rank-invariant estimation of inbreeding coefficients

Privacy-aware estimation of relatedness in admixed populations

Estimation of inbreeding and kinship coefficients via latent identity-by-descent states

Improving population-specific allele frequency estimates by adapting supplemental data: an empirical Bayes approach

Needles in the Haystack: Identifying Individuals Present in Pooled Genomic Data

Optimal Estimation Of Genetic Relatedness In High-Dimensional Linear Models

Effective Sample Size: Quick Estimation of the Effect of Related Samples in Genetic Case-Control Association Analyses

Purging putative siblings from population genetic data sets: a cautionary view

Relatedness coefficients and their applications for triplets and quartets of genetic markers

Estimating heterozygosity from a low-coverage genome sequence, leveraging data from other individuals sequenced at the same sites

Statistical Inference for Genetic Relatedness Based on High-Dimensional Logistic Regression

Ethnic-Affiliation Estimation by Use of Population-Specific Dna Markers

Correcting for Cryptic Relatedness in Genome-Wide Association Studies

Inferring Linkage Disequilibrium from Non-Random Samples†

Do estimates of contemporary effective population size tell us what we want to know?