HIV-1 M group subtype classification using deep learning approach

Sihua Peng
DOI: https://doi.org/10.1016/j.compbiomed.2024.109218
Abstract:Traditionally, the classification of HIV-1 M group subtypes has depended on statistical methods constrained by sample sizes. Here HIV-1-M-SPBEnv was proposed as the first deep learning-based method for classifying HIV-1 M group subtypes via env gene sequences. This approach overcomes sample size challenges by utilizing artificial molecular evolution techniques to generate a synthetic dataset suitable for machine learning. Employing a convolutional Autoencoder embedded with two residual blocks and two transpose residual blocks, followed by a full connected neural network block, HIV-1-M-SPBEnv simplifies complex, high-dimensional DNA sequence data into concise, information-rich, low-dimensional representations, achieving exceptional classification accuracy. Through independent data set validation, the precision, accuracy, recall and F1 score of the HIV-1-M-SPBEnv model predictions were all 100 %, confirming its capability to accurately identify all 12 subtypes of the HIV-1 M group. Deployed through a web server, it provides seamless HIV-1 M group subtype prediction capabilities for researchers and clinicians. HIV-1-M-SPBEnv web server is accessible at http://www.hivsubclass.com and all the code is available at https://github.com/pengsihua2023/HIV-1-M-SPBEnv.
What problem does this paper attempt to address?