An Online Cluster Analysis Method for Large-Scale Protein Sequences

DongMing Tang,QingXin Zhu,YueFei Zhang,Jiang Zhang
DOI: https://doi.org/10.1109/fbie.2009.5405808
2009-01-01
Abstract:As modern high-throughput sequencing technologies continue to improve, there is an overwhelming amount of protein sequences un-annotated in the biomedical databases. Clustering protein sequences into homologous groups can help to annotate uncharacterized protein sequences. In this paper, we introduce an online cluster analysis method for large-scale protein sequences based on online clustering algorithms and alignment-free similarity measure for protein sequences, namely, OnlineCAPS. The OnlineCAPS has many advantages, such as the memory requirements and computation cost are very low, the method is fast and enables us to extract clusters from a large scale set of protein sequences, and it can be deployed on the web server, and can perform clustering progress when uploading sequences dataset. The experimental results illustrate the efficiency of the proposed method.
What problem does this paper attempt to address?