Constructing networks for comparison of collagen types

Valentin Wesp,Lukas Scholz,Janine M. Ziermann-Canabarro,Stefan Schuster,Heiko Stark
DOI: https://doi.org/10.1101/2023.08.25.554753
2024-03-09
Abstract:Collagens are structural proteins that are predominantly found in the extracellular matrix of multicellular animals, where they are mainly responsible for the stability and structural integrity of various tissues. All collagens contain polypeptide strands (ɑ-chains). There are several types of collagens, some of which differ significantly in form, function, and tissue specificity. Because of their importance in clinical research, they are grouped into subdivisions, the so-called collagen families, and their sequences are often analysed. However, problems arise with highly homologous sequence segments. To increase the accuracy of collagen classification and prediction of their functions, the structure of these collagens and their expression in different tissues could result in a better focus on sequence segments of interest. Here, we analyse collagen families with different levels of conservation. As a result, clusters with high interconnectivity can be found, such as the fibrillar collagens, the COL4 network-forming collagens, and the COL9 FACITs. Furthermore, a large cluster between network-forming, FACIT, and COL28a1 ɑ-chains is formed with COL6a3 as a major hub node. The formation of clusters also signifies, why it is important to always analyse the ɑ-chains and why structural changes can have a wide range of effects on the body.
Systems Biology
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on the following aspects: 1. **Constructing an information network among different collagen α - chains**: By analyzing different α - chains of collagen and their expression in different tissues, construct a network that can reflect the relationships among these α - chains. This helps to reveal possible unknown interactions among different types of collagen and infer their spatial arrangements through simulation, such as the arrangements within the basement membrane (e.g., the basement membrane layer). 2. **Improving the accuracy of collagen classification and function prediction**: Due to the high homology of collagen sequences, traditional classification methods may encounter some problems. By analyzing the structure of collagen and its expression in different tissues, collagen can be classified more precisely and its function can be predicted. 3. **Verifying the consistency of the constructed network with existing literature**: By comparing with the reference matrices in PubMed and Google Scholar, verify whether the constructed network is consistent with the existing research results. This step helps to confirm the validity and reliability of the network. ### Specific research objectives - **Construct a collagen α - chain network based on high - confidence regions**: By comparing short high - confidence regions, construct a similarity matrix among collagen α - chains based on hydrophilicity and hydrophobicity, and further generate a network graph. - **Identify key parameters in the network**: Determine the optimal subsequence length and confidence threshold to obtain the most informative network. - **Analyze the structural characteristics of the network**: Identify sub - networks in the network and their connection methods, and explore the relationships among different types of collagen. - **Compare with literature data**: By comparing with the literature data in PubMed and Google Scholar, verify whether the constructed network is consistent with the existing research results. ### Research background Collagen is the main structural protein in the extracellular matrix (ECM) of multicellular animals, accounting for about 30% of the total mass of ECM. They are widely present in various tissues such as skin, connective tissue, blood vessel walls, bones and cartilage, and are crucial for the formation of the basic tissue structure during embryonic development. Currently, 28 different collagens are known, and these collagens can be divided into six different families. Due to the importance of collagen in clinical research, the analysis of its sequence is particularly important. However, highly homologous sequence fragments bring difficulties to classification. Therefore, through structural analysis and research on expression patterns, it is possible to better focus on the sequence fragments of interest and improve the accuracy of classification and function prediction.