A Network Filtration Protocol for Elucidating Relationships between Families in a Protein Similarity Network

Leonard Apeltsin
DOI: https://doi.org/10.48550/arXiv.1408.2575
2014-08-12
Abstract:Motivation: The study of diverse enzyme superfamilies can provide important insight into the relationships between protein sequence, structure and function. It is often challenging, however, to discover these relationships across a large and diverse superfamily. Contemporary similarity network visualization techniques allow researchers to aggregate sequence similarity information into a single global view. Network visualization provides a qualitative estimate of functional diversity within a superfamily, but is unable to quantitate explicit boundaries, when present, between neighboring families in sequence space. This limits the potential of existing sequence-based algorithms to generate functional predictions from superfamily datasets. Results: By building on current network analysis tools, we have developed a new algorithm for elucidating pairs of homologous families within a sequence dataset. Our algorithm is able to filter through a dense similarity network in order to estimate both the boundaries of individual families and also how the families neighbor one another. Globally, these neighboring families define a topology across the entire superfamily. The topology is simple to interpret by visualizing the network output generated by our filtration protocol. We have compared the network topology within the kinase superfamily against available phylogenetic data. Our results suggest that neighbors within the filtered kinase network are more likely to share structural and functional properties than more distant network clusters.
Quantitative Methods
What problem does this paper attempt to address?