Identification of DNase I Hypersensitive Sites in the Human Genome by Multiple Sequence Descriptors

Yan-Ting Jin,Yang Tan,Zhong-Hua Gan,Yu-Duo Hao,Tian-Yu Wang,Hao Lin,Bo Tang
DOI: https://doi.org/10.1016/j.ymeth.2024.06.012
IF: 4.647
2024-01-01
Methods
Abstract:DNase I hypersensitive sites (DHSs) are chromatin regions highly sensitive to DNase I enzymes. Studying DHSs is crucial for understanding complex transcriptional regulation mechanisms and localizing cis-regulatory elements (CREs). Numerous studies have indicated that disease-related loci are often enriched in DHSs regions, underscoring the importance of identifying DHSs. Although wet experiments exist for DHSs identification, they are often labor-intensive. Therefore, there is a strong need to develop computational methods for this purpose. In this study, we used experimental data to construct a benchmark dataset. Seven feature extraction methods were employed to capture information about human DHSs. The F-score was applied to filter the features. By comparing the prediction performance of various classification algorithms through five-fold cross-validation, random forest was proposed to perform the final model construction. The model could produce an overall prediction accuracy of 0.859 with an AUC value of 0.837. We hope that this model can assist scholars conducting DNase research in identifying these sites.
What problem does this paper attempt to address?