Abstract:BackgroundIn the United States, about 3 million people have autism spectrum disorder (ASD), and around 1 out of 59 children are diagnosed with ASD. People with ASD have characteristic social communication deficits and repetitive behaviors. The causes of this disorder remain unknown; however, in up to 25% of cases, a genetic cause can be identified. Detecting ASD as early as possible is desirable because early detection of ASD enables timely interventions in children with ASD. Identification of ASD based on objective pathogenic mutation screening is the major first step toward early intervention and effective treatment of affected children. ObjectiveRecent investigation interrogated genomics data for detecting and treating autism disorders, in addition to the conventional clinical interview as a diagnostic test. Since deep neural networks perform better than shallow machine learning models on complex and high-dimensional data, in this study, we sought to apply deep learning to genetic data obtained across thousands of simplex families at risk for ASD to identify contributory mutations and to create an advanced diagnostic classifier for autism screening. MethodsAfter preprocessing the genomics data from the Simons Simplex Collection, we extracted top ranking common variants that may be protective or pathogenic for autism based on a chi-square test. A convolutional neural network–based diagnostic classifier was then designed using the identified significant common variants to predict autism. The performance was then compared with shallow machine learning–based classifiers and randomly selected common variants. ResultsThe selected contributory common variants were significantly enriched in chromosome X while chromosome Y was also discriminatory in determining the identification of autistic individuals from nonautistic individuals. The ARSD, MAGEB16, and MXRA5 genes had the largest effect in the contributory variants. Thus, screening algorithms were adapted to include these common variants. The deep learning model yielded an area under the receiver operating characteristic curve of 0.955 and an accuracy of 88% for identifying autistic individuals from nonautistic individuals. Our classifier demonstrated a considerable improvement of ~13% in terms of classification accuracy compared to standard autism screening tools. ConclusionsCommon variants are informative for autism identification. Our findings also suggest that the deep learning process is a reliable method for distinguishing the diseased group from the control group based on the common variants of autism.

Using Genomic Context Informed Genotype Data and Within‐model Ancestry Adjustment to Classify Type 2 Diabetes

Diagnostic Classification and Prognostic Prediction Using Common Genetic Variants in Autism Spectrum Disorder: Genotype-Based Deep Learning

A Non-Parametric Method for Building Predictive Genetic Tests on High-Dimensional Data

Prediction of type 1 diabetes using a genetic risk model in the Diabetes Autoimmunity Study in the Young.

AI-driven Integration of Multimodal Imaging Pixel Data and Genome-wide Genotype Data Enhances Precision Health for Type 2 Diabetes: Insights from a Large-scale Biobank Study

Extracting Epistatic Interactions in Type 2 Diabetes Genome-Wide Data Using Stacked Autoencoder

Improving genetic risk prediction across diverse population by disentangling ancestry representations

Genomic Prediction of Complex Disease Risk

Enhancing schizophrenia phenotype prediction from genotype data through knowledge-driven deep neural network models

Genomic annotation of disease-associated variants reveals shared functional contexts

A Probabilistic Model to Predict Clinical Phenotypic Traits from Genome Sequencing

Using Recurrent Neural Networks for Predicting Type-2 Diabetes from Genomic and Tabular Data

Performance of deep-learning based approaches to improve polygenic scores

A comprehensive investigation of statistical and machine learning approaches for predicting complex human diseases on genomic variants

Winner's Curse Correction and Variable Thresholding Improve Performance of Polygenic Risk Modeling Based on Genome-Wide Association Study Summary-Level Data

Genetic risk prediction in complex disease

Improving genetic risk modeling of dementia from real-world data in underrepresented populations

Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank

Interpreting artificial neural networks to detect genome-wide association signals for complex traits

Deep Learning-Based Polygenic Risk Analysis for Alzheimer’s Disease Prediction

AI-enhanced integration of genetic and medical imaging data for risk assessment of Type 2 diabetes