Abstract:This scoping review aims to identify and evaluate the landscape of Polygenic Risk Score (PRS)-based methods for genomic prediction from 2013 to 2023, highlighting their advancements, key concepts, and existing gaps in knowledge, research, and technology. Over the past decade, various PRS-based methods have emerged, each employing different statistical frameworks aimed at enhancing prediction accuracy, processing speed and memory efficiency. Despite notable advancements, challenges persist, including unrealistic assumptions regarding sample sizes and the polygenicity of traits necessary for accurate predictions, as well as limitations in exploring hyper-parameter spaces and considering environmental interactions. We included studies focusing on PRS-based methods for risk prediction that underwent methodological evaluations using valid approaches and released computational tools/software. Additionally, we restricted our selection to studies involving human participants that were published in English language. This review followed the standard protocol recommended by Joanna Briggs Institute Reviewer's Manual, systematically searching Ovid MEDLINE, Ovid Embase, Scopus and Web of Science databases. Additionally, searches included grey literature sources like pre-print servers such as bioRxiv, and articles recommended by experts to ensure comprehensive and diverse coverage of relevant records. This study identified 34 studies detailing 37 genomic prediction methods, the majority of which rely on linkage disequilibrium (LD) information and necessitate hyper-parameter tuning. Nine methods integrate functional/gene annotation, while 12 are suitable for cross-ancestry genomic prediction, with only one considering gene-environment (GxE) interaction. While some methods require individual-level data, most leverage summary statistics, offering flexibility. Despite progress, challenges remain. These include computational complexity and the need for large sample sizes for high prediction accuracy. Furthermore, recent methods exhibit varying effectiveness across traits, with absolute accuracies often falling short of clinical utility. Transferability across ancestries varies, influenced by trait heritability and diversity of training data, while handling admixed populations remains challenging. Additionally, the absence of standard error measurements for individual PRSs, crucial in clinical settings, underscores a critical gap. Another issue is the lack of customizable graphical visualization tools among current software packages. While genomic prediction methods have advanced significantly, there is still room for improvement. Addressing current challenges and embracing future research directions will lead to the development of more universally applicable, robust, and clinically relevant genomic prediction tools.

A Machine-Learning Heuristic to Improve Gene Score Prediction of Polygenic Traits

Improved polygenic prediction by Bayesian multiple regression on summary statistics

A Non-Parametric Method for Building Predictive Genetic Tests on High-Dimensional Data

Deep learning for polygenic prediction: The role of heritability, interaction type and sample size

Genetic Prediction Modeling in Large Cohort Studies via Boosting Targeted Loss Functions

Performance of deep-learning based approaches to improve polygenic scores

Deep Learning for Polygenic Risk Prediction

A Deep Learning-based Genome-wide Polygenic Risk Score for Common Diseases Identifies Individuals with Risk

A comprehensive investigation of statistical and machine learning approaches for predicting complex human diseases on genomic variants

Genomic Prediction of Complex Disease Risk

Improving GWAS discovery and genomic prediction accuracy in biobank data

Optimization of Multi-Ancestry Polygenic Risk Score Disease Prediction Models

Leveraging Effect Size Distributions to Improve Polygenic Risk Scores Derived from Summary Statistics of Genome-Wide Association Studies.

Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data

Improving polygenic prediction from summary data by learning patterns of effect sharing across multiple phenotypes

Advancements and limitations in polygenic risk score methods for genomic prediction: a scoping review

Identifying the predictive effectiveness of a genetic risk score for incident hypertension using machine learning methods among populations in rural China

Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations

Leveraging trans-ethnic genetic risk scores to improve association power for complex traits in underrepresented populations

Effective Genetic Risk Prediction Using Mixed Models

Genomic Prediction of 16 Complex Disease Risks Including Heart Attack, Diabetes, Breast and Prostate Cancer