Editorial (thematic Issue: Protein Systems Biology: Method, Regulation, and Network)
Qingfeng Chen,Ming Chen
DOI: https://doi.org/10.2174/1389203715666140724091730
2014-01-01
Abstract:The advent of various advanced biological experimental techniques for discovery of conservations and interactions between molecules has prompted the study of biological regulation networks in a systematic way. The increasingly growth of organisms, genome-scale conservation and co-expression yield valuable data sources for exploring the functional and regulatory roles of biological systems. Graphs can be easily applied to represent a given biological network by transferring its basic entities and interactions into nodes and edges, respectively. It is important to develop effective methods for comparing these networks from varied organisms and discover the common/frequent subgraphs. This is able help reveal their functions and the exact features or scheme to carry out these functions. There have been increasing efforts to find featured patterns of interaction conserved by several biological networks. The purpose of this special issue is to discuss the state of arts of the latest techniques and methods for discovering diverse regulatory networks with respect to metabolism, PPI (protein-protein interaction), gene expression, and signaling pathways. THE ADVENT OF BIOLOGY BIG DATA There has been a biology data explosion owing to the application of advanced experiment technologies, such as next generation sequencing or deep sequencing. For example, the European Bioinformatics Institute (EBI), “one of the world's largest biology-data repositories, currently contains 20 petabytes (1 petabyte is 1015 bytes) of data and back-ups regarding genes, proteins and small molecules” [1]. This not only makes it possible to perform a comprehensive analysis of the genome and the transcriptome of a specific species, but also generates a big challenge to handle, process and extract information from the massive big data sets. These data have been widely manipulated to produce various research solutions. Since costs have been largely reduced owing to high-throughput instruments, small biology labs can also yield big data. It is possible that big data users are from small labs without such good facilities but they can access online data from public repositories. The biology data are usually more heterogeneous in contrast with conventional transaction data since a variety of experiments can give rise to different kinds of information, such as protein-protein interactions (PPI), various sequence, and RNA secondary structures or detections in the transcriptome. This generates a demand for biology data mining to access big data sets, integrate, analyze, compare and interpret the complex data [2]. In many ongoing studies, data-sharing has become a popular way for a large scale genomic or proteomic comparisons. However, the traditional approaches by downloading the data, storing them in own computer and analyze the data are time-consuming and high cost. Also, it is impossible for all users to have the required computational facilities, such as supercomputer and software. Without a doubt, this addresses the needs of flexible and public computational platform, including intelligent strategies about storage, management and analysis for dealing with biology big-data. A number of companies, institutes and labs have been established for different commercial and academic purposes, such as the National Center for Biotechnology Information (NCBI), EBI and Beijing Genomics Institute (BGI). They provide open access data sources, including data download and search, and usually allow one to obtain a data set from one location. To aid scientists in sharing their data across different countries and regions, this highlights the need to share computational resources and users can access the hardware and software on demand. Cloud computation is a recently emerging technology to cope with biology big-data mining. They not only offer virtual storage for data sharing, software and outcomes that a selected group of collaborators can share, but also prevents unauthorized users from accessing them [3]. A number of scientists can download/upload data and use software via cloud-based platforms. There have been many data sets and software programs situated in huge and offsite centers. For example, IT Center for Science at http://www.csc.fi/english is a high-performance computing centre run and funded by the government of Finland. Embassy Cloud is a cloud-computing component for ELIXIR by EBI, which provides safe computational environments and data download service for comparison purposes. These cloud-based infrastructures address the continued data growth and facilitate scientists to have a quick access to the information they need. Nevertheless, big-data transfer between local and remote sites and data sharing between collaborators remain a big challenge owing to unexpected interruption of data transfer. Regulatory networks have become a prevalent way to store and manage a large volume of biological data by modeling the molecular interactions. A deep study of identifying bimolecular networks and their correlations assists in understanding cellular behaviors and uncovering their functions in cellular systems. As a result, protein system biology, one of the most important forms of network system biology, will play a central role in life science. Life science is becoming data-driven. Big data science including data management, sharing and analysis is useful to construct dynamic and interacted protein regulatory network in an organism. THE IMPORTANCE OF PROTEIN SYSTEMS BIOLOGY Systems biology has become a hot research topic since 2000, from the construction of diverse biological systems, data visualization to big-data management and analysis in molecular biology and biomedicine. The molecule, cells, tissues and organs are not independent but perform their function together in a systematic way. It has been widely applied in both biological and biomedical studies to explore complex interactions between components within biology systems by biomedical studies to explore complex interactions between components within biology systems by virtue of computational methods and mathematical models. Gene networks and protein networks are two typical networks of system biology, in which the properties and patterns of protein-based regulation and gene-related components play an important role in understanding functions and behaviors of biology systems [4]. A number of commercial or academic research institutes, centers and labs have been established for systems biology investigation. FAS center for systems biology is an interdepartmental initiative at Harvard University, which aims to explain the structure, behavior and evolution of cells and organisms by combining quantitative measurements and systematic measurement including genomics, proteomics, and computational biology, and mathematical models to extract and describe the dynamical behavior of groups of interacting components. Systems problems have become an important topic to all computational biology research and medicine design. The New South Wales Systems Biology Initiative was funded by the Australian Research Council and NSW State Government, (http://www.systemsbiology.org.au/). It is located at the University of New South Wales and targets at developing bioinformatics algorithms and tools for genomics and proteomics. SBI was established in 2000 and aimed to facilitate systems biology research in several important areas with respect to healthcare and global sustainability. It has been widely applied in a number of research programs mostly supported by Japanese government and private foundations. “Pathways have been viewed as a convenient way of summarizing the results of a collection of experiments to describe the flow of signals or metabolites in a cell. A number of databases regarding metabolic and signaling pathways are developed to represent the relationships between molecules involved in various events, including reactions or as activation or inhibition” [3]. Notwithstanding many attempts to extract properties and details of the interaction, such as phosphorylation sites, there is generally insufficient functional details to interpret the actual meaning of the link between two proteins. Relevant molecules, identified binding sites and their interactions are able to greatly illuminate the understanding of protein system biology. THE MOTIVATION FOR NOVEL COMPUTATIONAL METHODS A great deal of molecular interactions have been unveiled, but the details of precise interactions are still far from perfect and comprehensive. The difficulties to predict the behavior of involved genes and proteins mainly arise from the complexity of turning the abstract biology system into models that exactly report the system reality, and the heterogeneity and size of biological big data from multiple data sources. The paradigm of systems biology thus generates a demand for computational method, interaction prediction and network construction. High-throughput sequencing projects have identified a collection of involved components that function in an organism. Many studies in post-genomic projects target extracting their relationships. Systems biology is thus motivated to make sense of these relationships by considering them together, and simulates the manner by which the participated molecules work together to obtain a designated outcome or perform targeted functions. As a result, traditional molecular biology that focused on studying single molecules has been moved to systems biology by exploring pathways, complexes or even an organism. To understand diverse pathways and or networks regarding gene regulation, scientists must have a good knowledge about the correlations between protein and protein, protein and metabolite and protein and nucleic acid. Structural information has been a useful way to offer a comprehensive understanding of interaction between molecules by relying on atomic details about binding. However, it takes time for detailed structural information of a large complexes or whole systems to be reached. Thus, this urges us to develop new computational methods to discover and model the relationships between interacting molecules. CONTRIBUTIONS TO THIS ISSUE The articles included in this special issue are classified into protein function prediction, protein-protein interaction, and protein regulation pattern. Methods for construction and characterization of amino acid networks are reviewed by Jianhong Zhou et al. The authors summarized and discussed network properties applied to the native structure selection, providing a future perspective on the application of amino acid networks for the native folding detecting among the decoy sets. Wei Peng et al. proposed an unbalanced Bi-random walk (UBiRW) algorithm to predict protein functions which iteratively walks different number of steps in the two networks is adopted to find protein-GO term associations according to some known associations. “The interface in a complex involves two structurally matched protein subunits, and the binding sites can be predicted by identifying structural matches at protein surfaces” [5]. Understanding energetic and mechanism of complexes remains one of the essential problems in binding site prediction. Fei Guo et al. developed a system, PBinder, for identifying binding sites based on structural compatibility, side-chain conformations, amino acid types and contact energies. The system reports improvements in prediction correctness, according to both accuracy and coverage. Among the most important networks maintaining biological functions, protein-protein interactions span from local binary interaction to an entire cell. It is still a long sought scientific goal to understand how the interacting partners recognize and bind each other precisely. Comparing with other existed method, Least Squares regression (LSR) proposed by De-Shuang Huang elvirtue ofal. is a powerful tool to characterize the protein-protein correlations and to infer PPI, whilst keeping high performance on prediction of PPI networks. The review article written by Chiranjib Chakraborty et al. enhances our knowledge on how PPI networks architecture can use to validate a drug target. At the conclusion, future directions of PPI in target discovery and drug-design have been suggested. Based on reviewing the key regulators in the hydroxylated triacylglycerol ricinoleate biosynthesis pathway of castor bean, Yujie Chen et al. analyzed several key regulators from the aspect of the structure/function prediction and similar expression pattern mechanisms aimed to give an insight on the better understandings of the biosynthesis knowledge for this energy-rich molecule and the key regulators performance in the pathways. Lili Liu et al. defines the organelle-focused proteome and interactome of rice based on manual annotation, manual adjustment and predictors’ cross validation. Furthermore, the cross talk bias between different organelles and the function organization accounting for nine organelles are explored. Wei Lan et al. explore the overlooked positions of microRNAs (miRNAs) based on sequential and structural features since they have been recognized as important regulators in a wide range of biological processes. These functions may be exploited for miRNA-mediated regulation of protein expression. Collision entropy is applied to measure the degree of importance of miRNA position. In particular, two thresholds are used to prune those unimportant positions. The findings unveil important positions of miRNAs related to biogenesis and function. “Rapid advances in network biology indicate that cellular networks are governed by universal laws and offer a new conceptual framework that could potentially revolutionize our view of biology and disease pathologies in the twenty-first century” [6]. CONCLUSIONS Systems biology has been successful in predicting the behavior of a set of molecules involved in biological systems and understanding their interactions. As an important branch of systems biology, protein systems biology focuses on investigating the properties and patterns with respect to protein-related interactions. Owing to the application of high-throughput sequencing techniques, biological big-data has become a big challenge to both biologists and computer scientist. Traditional computational methods on the basis of local data and computational facilities have showed their limitations in addressing a large volume of biological data from multiple sources. Thus, it is crucial to develop public computational platforms, including equipment and software for storage, management and analysis of biological big-data. This needs collaboration of scientists from biology, computer science and mathematics. To understand the properties of components in biological systems and their relationships, these result in a number of interesting research topics described in this special issue. ACKNOWLEDGEMENTS We wish to thank all the authors who have contributed with their work to foster the dissemination of scientific excellence in the protein network biology field; all the reviewers for giving their time and expertise to evaluate manuscripts submitted for this publication. The work reported in this paper was partially supported by a National Natural Science Foundation of China project 61363025 and 31371328, and two key projects of Natural Science Foundation of Guangxi 053006 and 019029.