ECPF: an Efficient Algorithm for Expanding Clustered Protein Families

Zhongyang Zuo,Yanheng Liu,Liyan Zhao,Li Xu,Jian Wang,Xiaoyan Lv
DOI: https://doi.org/10.1504/ijdmb.2016.10003177
2016-01-01
International Journal of Data Mining and Bioinformatics
Abstract:With the quick development of gene sequencing technology, the explosion age marked by protein sequences has already come. How to deal with a huge number of protein sequences has aroused serious concern in the research field. An effective solution is to cluster homologous sequences into separated protein families. Those proteins that are affiliated to the same protein family share the similar structure and/or the functionality of genes. The known proteins will facilitate to identify various valuable evidences for discovering the unknown proteins. We present an efficient and effective algorithm called Expanding Clustered Protein Families (ECPF), which could skilfully optimise the clustered protein sequences. The results show that ECPF is capable of discovering the unknown connections between storing space and families in large-scale databases while consuming acceptable overhead of computational time. ECPF successfully expands the protein sequence network and furthermore creates a more practical protein sequence topology for promoting biological research.
What problem does this paper attempt to address?