Statistical analysis for the development of a Deep Learning model for classification of images with TDP-43 pathology

Azucena Muñoz,Vasco Oliveira,Marta Vallejo
DOI: https://doi.org/10.1101/2024.02.12.24300689
2024-02-14
Abstract:Diagnosing Amyotrophic Lateral Sclerosis (ALS) remains a hand challenge due to its inherent heterogeneity. Notably, the occurrence of TDP-43 cytoplasmic aggregation in approximately 95% of ALS cases has emerged as a potential indicative hallmark. In order to develop deep learning models capable of distinguishing TDP-43 proteinopathic samples from their healthy counterparts, a comprehensive understanding of the sample set becomes imperative, particularly when the sample size is limited. The samples in question encompassed images obtained via an immunofluorescence procedure, employing super high-resolution microscopy coupled with meticulous processing. A feature-extracted dataset was created to collect meaningful features from every sample to approach three different classification problems (TDP-43 Pathology, TDP-43 Pathology Grades and ALS) based on the number of red and pink pixels, signifying cytoplasmic and nuclear TDP-43 presence. A series of diverse statistical approaches were undertaken. However, definitive outcomes remained elusive, although it was suggested that a classification based on the presence of TDP-43 proteinopathy was better than the one based on the presence of ALS for training the model. The dataset was reduced by eliminating the problematic samples through curation. Analyses were repeated using t-student tests and ANOVA, and visualisation of patient inter-variability was performed using hierarchical clustering. The TDP-43 pathology classification results showed significant differences in the number of red and pink pixels, the total amount of protein and the cytoplasmic and nuclear proportions between healthy and pathological samples between groups. These findings suggested that images classified according to the presence of TDP-43 proteinopathy are more suitable for training deep learning models.
Neurology
What problem does this paper attempt to address?