Abstract:An exhaustive literature survey shows that finding protein/gene similarity is an important step towards solving widespread bioinformatics problems, such as predicting protein-protein interactions, analyzing Protein-Protein Interaction Networks (PPINs), gene prioritization, and disease gene/protein detection. In this article, we have proposed an improved 3-in-1 fused protein similarity measure called FuSim-II. It is built upon combining the weighted average of biological knowledge extracted from three potential genomic/ proteomic resources such as Gene Ontology (GO), PPIN, and protein sequence. Furthermore, we have shown the application of the proposed measure in detecting potential hub-proteins from a given PPIN. Aiming that, we have proposed a multi-objective clustering-based protein hub detection framework with FuSim-II working as the underlying proximity measure. The PPINs of H. Sapiens and M. Musculus organisms are chosen for experimental purposes. Unlike most of the existing hub-detection methods, the proposed technique does not require to follow any protein degree cut-off or threshold to define hubs. A thorough assessment of efficiency between proposed and existing eight protein similarity measures along with eight single/multi-objective clustering methods has been carried out. Internal cluster validity indices like Silhouette and Davies Bouldin (DB) are deployed to accomplish analytical study. Also, a comparative performance analysis between proposed and five existing hub-proteins detection algorithms is conducted through the enrichment of essentiality study. The reported results show the improved performance of FuSim-II over existing protein similarity measures in terms of identifying functionally related proteins as well as relevant hub-proteins. Supplementary material is available at http://csse.szu.edu.cn/staff/cuilz/eng/index.html.

Clustering Protein Sequences Using Affinity Propagation Based on an Improved Similarity Measure

Adjustable Preference Affinity Propagation Clustering

A Similarity Computing Algorithm for Proteins

Towards Automatic Clustering of Protein Sequences

A Refined 3-in-1 Fused Protein Similarity Measure: Application in Threshold-Free Hub Detection

A Multiple Criteria Framework for 3D Protein Structure Similarity Retrieval

Grouping of Amino Acids and Recognition of Protein Structurally Conserved Regions by Reduced Alphabets of Amino Acids

An efficient parallel algorithm for multiple sequence similarities calculation using a low complexity method.

Detecting Protein Complexes by an Improved Affinity Propagation Algorithm in Protein-Protein Interaction Networks.

Modified Semi-Supervised Affinity Propagation Clustering with Fuzzy Density Fruit Fly Optimization.

Automated Hub-Protein Detection Via a New Fused Similarity Measure-Based Multi-objective Clustering Framework.

Clustering by soft-constraint affinity propagation: Applications to gene-expression data

Comparison of Methods for Biological Sequence Clustering.

Combining Local Graph Clustering and Similarity Measure For Complex Detection

Cluseq: Efficient And Effective Sequence Clustering

An Improved Ap Algorithm For Identifying Overlapping Functional Modules In Protein-Protein Interaction Networks

A Degree-Distribution Based Hierarchical Agglomerative Clustering Algorithm for Protein Complexes Identification

Cluster Ensemble Algorithm Using Affinity Propagation

Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences

ADAPTIVE SEMI-SUPERVISED AFFINITY PROPAGATION CLUSTERING ALGORITHM BASED ON STRUCTURAL SIMILARITY

A clustering effectiveness measurement model based on merging similar clusters