Pre-processing annotated homologous regions in protein sequences concerning machine-learning applications

Nawar Malhis
DOI: https://doi.org/10.1101/2024.10.25.620288
2024-10-25
Abstract:While annotated protein sequences are widely used in machine learning applications, pre-processing these sequences regarding homology is mainly limited to clustering complete sequences based on global alignment without considering their annotations. Here, I am introducing new tools that identify all possible local homologies between annotated sequences within the same or across two datasets and then resolve these homologies.
Bioinformatics
What problem does this paper attempt to address?