Generating Minimal Models of H1N1 NS1 Gene Sequences Using Alignment-Based and Alignment-Free Algorithms.

Meng Fang,Jiawei Xu,Nan Sun,Stephen S. -T. Yau
DOI: https://doi.org/10.3390/genes14010186
IF: 4.141
2023-01-01
Genes
Abstract:For virus classification and tracing, one idea is to generate minimal models from the gene sequences of each virus group for comparative analysis within and between classes, as well as classification and tracing of new sequences. The starting point of defining a minimal model for a group of gene sequences is to find their longest common sequence (LCS), but this is a non-deterministic polynomial-time hard (NP-hard) problem. Therefore, we applied some heuristic approaches of finding LCS, as well as some of the newer methods of treating gene sequences, including multiple sequence alignment (MSA) and k-mer natural vector (NV) encoding. To evaluate our algorithms, a five-fold cross validation classification scheme on a dataset of H1N1 virus non-structural protein 1 (NS1) gene was analyzed. The results indicate that the MSA-based algorithm has the best performance measured by classification accuracy, while the NV-based algorithm exhibits advantages in the time complexity of generating minimal models.
What problem does this paper attempt to address?