A simple software program for sorting biological sequences into unique groups.

Babu Bassa
DOI: https://doi.org/10.26434/chemrxiv-2024-qpm6b
2024-03-01
Abstract:In this communication the author describes a software tool named "ChameleonSort". The software program, developed by the present author is useful in the sorting of biological sequence variants like those accumulating mutations while diverging from the common ancestors. Examples include viral protein variants, protein isomers, protein orthologs and polynucleotide sequences like DNA and RNA (1, 2). The program sorts the query sequences into unique groups. The output includes groups of sequences with unique permutation of the monomer units (amino acids and nucleotides) where the sequences of each group are identical but different from the sequences of every other group at least at one amino acid position or different in length by at least one monomeric unit. The algorithm has been implemented in the Visual Basic language which is a component of Microsoft’s Visual Studio (3). The entire code is made available as part of supplementary data in this communication. The user friendly program is available free of cost at Github.com (4), for downloading onto Windows-10 or higher operating systems.
Chemistry
What problem does this paper attempt to address?
The paper introduces a software tool called ChameleonSort, which aims to solve the problem of classifying biological sequences (such as protein variants, viral proteins, protein isomers, and polymeric nucleotide sequences such as DNA and RNA). These sequences may accumulate mutations during evolution, leading to differentiation from a common ancestor. ChameleonSort groups the input query sequences based on unique permutations and combinations, where sequences within each group are identical to each other but differ from sequences in other groups by at least one amino acid position or differ in length by at least one monomeric unit. The algorithm works similar to a fisherman categorizing different types of fish: assigning a new position when encountering a new species, or adding the fish to an existing position otherwise. The program is implemented using the Visual Basic language and can be downloaded for free from GitHub for Windows 10 or higher operating systems. The program matches and groups sequences by comparing the unique hexadecimal values of each amino acid or nucleotide monomer, without the need for reference sequences. To validate the effectiveness of ChameleonSort, the authors conducted tests using simulated sequences, demonstrating that the program can correctly group the sequences into sets with unique permutations. Additionally, the paper explains how to sort specific regions of protein molecules and provides examples to demonstrate the organization and analysis of the results. In summary, the paper addresses the problem of efficiently classifying variants of biological sequences, providing a simple, user-friendly, and free software tool called ChameleonSort that contributes to the study of protein diversity and variation.