Assessment of Machine Learning Methods for Classification in Single Cell ATAC-seq

Zhe Cui,Liran Juan,Tao Jiang,Bo Liu,Tianyi Zang,Yadong Wang
DOI: https://doi.org/10.1109/BIBM49941.2020.9313138
2020-01-01
Abstract:Single-cell assay for transposase accessible chromatin using sequencing(scATAC-seq) is rapidly advancing our understanding of the cellular composition of complex tissues and organisms. The similarity of data structure and feature between scRNA-seq and scATAC-seq makes it feasible to identify the cell types in scATAC-seq through traditional supervised machine learning methods. Here, we evaluated 6 popular machine learning methods for classification in scATAC-seq. The performance of the methods is evaluated using 4 public single cell ATAC-seq datasets of different tissues, sizes and technologies. We evaluated these methods using intradatasets experiments of 5-folds cross validation based on accuracy, recall and percentage of correctly predicted cells. We found that these methods may perform well in some types of cells in a single dataset, but the overall results are not as well as in scRNA-seq analysis. For testing the classification ability of machine learning methods across datasets, we applied inter-dataset experiments to test the performance of machine learning methods in realistic scenarios. SVM and NMC are overall the top 2 best-performing methods across all experiments. We recommend researchers to apply SVM and NMC as the underlying classifier when developing an automatic classification method in scATAC-seq.
What problem does this paper attempt to address?