An advanced approach for DNA sequencing and similarities analysis on the basis of groupings of nucleotide bases

Kshatrapal Singh,Laxman Singh,Vijay Shukla,Yogesh Kumar Sharma,Arun Kumar Rai
DOI: https://doi.org/10.1504/ijdmb.2025.143005
2024-12-04
International Journal of Data Mining and Bioinformatics
Abstract:In order to seamlessly identify the links between various DNA sequences on a broad scale, DNA sequencing is a crucial tool. But there is still more potential for advancement in sequencing quality. A highly well-liked method for determining sequence similarities is the alignment-free technique. As per their chemical characteristics, the four bases of DNA sequences A , C , G , and T are separated in three types of groupings in this research. A primary DNA sequence is transformed into three symbolic sequences. In order to depict the sequence, the frequencies of group variations of three notational sequences have been aggregated in a 12-component vector. The nucleotide sequences of beta globin gene on a dataset of several species are characterised and compared using the Euclidean distances across inserted vectors. Using phylogenetic trees, the evolutionary relationships between various organisms are visually represented. A phylogenetic tree's branch structure shows how several species or other groups diverged from several common ancestors. Our findings are in agreement with recent biological assessments. Additionally, we compared our approach to a few currently used sequence comparing techniques and discover that it is more efficient and user-friendly. We also analysed the time and space complexities of our proposed approach.
mathematical & computational biology
What problem does this paper attempt to address?