Prediction of the Age at Onset of Spinocerebellar Ataxia Type 3 with Machine Learning

Linliu Peng,Zhao Chen,Tiankai Chen,Lijing Lei,Zhe Long,Mingjie Liu,Qi Deng,Hongyu Yuan,Guangdong Zou,Linlin Wan,Chunrong Wang,Huirong Peng,Yuting Shi,Puzhi Wang,Yun Peng,Shang Wang,Lang He,Yue Xie,Zhichao Tang,Na Wan,Yiqing Gong,Xuan Hou,Lu Shen,Kun Xia,Jinchen Li,Chao Chen,Zuping Zhang,Rong Qiu,Beisha Tang,Hong Jiang
DOI: https://doi.org/10.1002/mds.28311
IF: 9.698
2020-09-29
Movement Disorders
Abstract:<section class="article-section__content"><h3 class="article-section__sub-title section1"> Background</h3><p>In polyglutamine (polyQ) disease, the investigation of the prediction of a patient's age at onset (AAO) facilitates the development of disease‐modifying intervention and underpins the delay of disease onset and progression. Few polyQ disease studies have evaluated AAO predicted by machine‐learning algorithms and linear regression methods.</p></section><section class="article-section__content"><h3 class="article-section__sub-title section1"> Objective</h3><p>The objective of this study was to develop a machine‐learning model for AAO prediction in the largest spinocerebellar ataxia type 3/Machado–Joseph disease (SCA3/MJD) population from mainland China.</p></section><section class="article-section__content"><h3 class="article-section__sub-title section1"> Methods</h3><p>In this observational study, we introduced an innovative approach by systematically comparing the performance of 7 machine‐learning algorithms with linear regression to explore AAO prediction in SCA3/MJD using CAG expansions of 10 polyQ‐related genes, sex, and parental origin.</p></section><section class="article-section__content"><h3 class="article-section__sub-title section1"> Results</h3><p>Similar prediction performance of testing set and training set in each models were identified and few overfitting of training data was observed. Overall, the machine‐learning‐based XGBoost model exhibited the most favorable performance in AAO prediction over the traditional linear regression method and other 6 machine‐learning algorithms for the training set and testing set. The optimal XGBoost model achieved mean absolute error, root mean square error, and median absolute error of 5.56, 7.13, 4.15 years, respectively, in testing set 1, with mean absolute error (4.78 years), root mean square error (6.31 years), and median absolute error (3.59 years) in testing set 2.</p></section><section class="article-section__content"><h3 class="article-section__sub-title section1"> Conclusion</h3><p>Machine‐learning algorithms can be used to predict AAO in patients with SCA3/MJD. The optimal XGBoost algorithm can provide a good reference for the establishment and optimization of prediction models for SCA3/MJD or other polyQ diseases. © 2020 International Parkinson and Movement Disorder Society</p></section>
clinical neurology
What problem does this paper attempt to address?