![](https://cdn1.deepmd.net/static/img/d7d9741bda38a158-957c-4877-942f-4bf6f81fcc63.png?x-oss-process=image/resize,w_100,m_lfit)
![](https://cdn1.deepmd.net/bohrium/web/static/images/level-v2-1.png?x-oss-process=image/resize,w_50,m_lfit)
019 激酶的相似性:氨基酸序列
🏃🏻 快速开始
您可以直接在 Bohrium Notebook 上执行此文档。首先,请点击位于界面顶部的 开始连接 按钮,然后选择 bohrium-notebook:05-31 镜像并选择合适的的机器配置,稍等片刻即可开始运行。
📖 来源
本 Notebook 来自 https://github.com/volkamerlab/teachopencadd,由杨合 📨 修改搬运至 Bohrium Notebook。
Aim of this talktorial
In this talktorial, we investigate sequence similarity for kinases of interest. KLIFS API is used to retrieve the residues of the pocket sequence for each kinase.
Two similarity measures are implemented:
- Sequence identity, i.e., the similarity which is based on character-wise discrepancy.
- Sequence similarity, i.e., the similarity which is based on a substitution matrix, thus, reflecting similarities between amino acids.
Note: We focus on similarities between orthosteric kinase binding sites; similarities to allosteric binding sites are not covered.
Contents in Theory
- Kinase dataset
- Kinase similarity descriptor: Sequence
- Identity score
- Substitution score
- From similarity matrix to distance matrix
Contents in Practical
- Define the kinases of interest
- Retrieve sequences from KLIFS
- Sequence similarity
- Identity score
- Substitution score
- Kinase similarity
- Visualize similarity as kinase matrix
- Save kinase similarity matrix
- Kinase distance matrix
- Save kinase distance matrix
References
- Kinase dataset: Molecules (2021), 26(3), 629
- KLIFS
- KLIFS URL: https://klifs.net/
- KLIFS database: Nucleic Acid Res. (2020), 49(D1), D562-D569
- Sequence-based kinase clustering: Manning et al. Science (2002), 298(5600), 1912-1934
- Substitution matrix: PNAS (1992), 89(22), 10915-10919
- Biotite
- Documentation: https://www.biotite-python.org/index.html
- Blosum matrix
- Sequence logo: http://www.cbs.dtu.dk/biotools/Seq2Logo/
Theory
Kinase dataset
We use the kinase selection as defined in Talktorial T023.
Kinase similarity descriptor: Sequence
As mentioned in the previous talktorial, sequence is often used to assess kinase similarity (see the phylogenetic tree developed by Manning et al. Science (2002), 298(5600), 1912-1934).
In this talktorial, the KLIFS pocket sequence is used for two main reasons:
The sequence is of fixed length (it contains residues), which makes computation for pairwise similarity between two sequences straightforward.
The binding pocket is where the action takes place. Why consider the full kinase sequence when an residues sequence contains most relevant information?
Figure 1: The sequence logo shows amino acid binding motifs and sequence profiles such as amino acid depletion. For example, the sequence logo easily visualizes the conserved G-rich loop (position 4-9) and the DFG motif (position 81-83), see Talktorial T023 for more details.
Note for reproducibility: The figure is generated using the Seq2Logo tool available at http://www.cbs.dtu.dk/biotools/Seq2Logo/. The input is the residues KLIFS binding pocket for the query kinases. All parameters are the the default ones. For the graphical layout, the entry Stacks Per Line is set to and Page size to
We now describe two ways to compare pocket sequences.
Identity score
A simple way of assessing the similarity between two sequences is to use the so-called identity score. First, a match vector is created: it checks whether for each position the characters from the two sequences are identical. If they are, the entry is set to and otherwise.
The identity score is computed by summing the elements in the match vector and normalizing the entry by the length of the sequence, which is in the case of KLIFS pocket sequence.
Let's consider the identity matrix below:
A | C | D | E | ... | |
---|---|---|---|---|---|
A | 1 | 0 | 0 | 0 | ... |
C | 0 | 1 | 0 | 0 | ... |
D | 0 | 0 | 1 | 0 | ... |
E | 0 | 0 | 0 | 1 | ... |
... | ... | ... | ... | ... | ... |
and let for two kinases and .
We use the following as similarity between kinases and :
where represents the amino acid at position of the sequence of kinase .
Substitution score
Although the identity score is an easy measure of similarity, it does not take into account the rate at which an amino acid may change into another and treats all residues uniformly.
The substitution score takes the changes of the amino acids over evolutionary time into account. It makes use of a substitution matrix, where each entry gives a score between two amino acids.
In this talktorial, we use the BLOSUM substitution matrix PNAS (1992), 89(22), 10915-10919, implemented in biotite
.
The BLOSUM substitution matrix is defined as below (the full matrix will be displayed in the Practical part):
A | C | D | E | ... | |
---|---|---|---|---|---|
A | 4 | 0 | -2 | -1 | ... |
C | 0 | 9 | -3 | -4 | ... |
D | -2 | -3 | 6 | 2 | ... |
E | -1 | -4 | 2 | 5 | ... |
... | ... | ... | ... | ... | ... |
The BLOSUM substitution matrix is symmetric, as shown in the practical part.
For convenience, we will translate and rescale the matrix using the following:
for translation, and
for all
for rescaling, such that
and
We use the following as similarity between kinases and :
where represents the amino acid at position of the sequence of kinase and .
From similarity matrix to distance matrix
In order to apply some clustering algorithm to assess the similarity between kinases, it is necessary to start with a distance matrix. A similarity matrix is not, by definition, a distance matrix. For example, the diagonal elements are not zero. For now, we map the similarity matrix to a distance matrix using
See Talktorial T028 for more details.
Practical
Run in demo mode: True
Define the kinases of interest
Let's load the kinase selection as defined in Talktorial T023.
kinase | kinase_klifs | uniprot_id | group | full_kinase_name | |
---|---|---|---|---|---|
0 | EGFR | EGFR | P00533 | TK | Epidermal growth factor receptor |
1 | ErbB2 | ErbB2 | P04626 | TK | Erythroblastic leukemia viral oncogene homolog 2 |
2 | PI3K | p110a | P42336 | Atypical | Phosphatidylinositol-3-kinase |
3 | VEGFR2 | KDR | P35968 | TK | Vascular endothelial growth factor receptor 2 |
4 | BRAF | BRAF | P15056 | TKL | Rapidly accelerated fibrosarcoma isoform B |
5 | CDK2 | CDK2 | P24941 | CMGC | Cyclic-dependent kinase 2 |
6 | LCK | LCK | P06239 | TK | Lymphocyte-specific protein tyrosine kinase |
7 | MET | MET | P08581 | TK | Mesenchymal-epithelial transition factor |
8 | p38a | p38a | Q16539 | CMGC | p38 mitogen activated protein kinase alpha |
Retrieve sequences from KLIFS
We use the KLIFS API to retrieve the -long pocket sequence for each kinase.
We create a dictionary made of the kinase name and its associated pocket sequence. This dictionary is used throughout this notebook.
EGFR KVLGSGAFGTVYKVAIKELEILDEAYVMASVDPHVCRLLGIQLITQLMPFGCLLDYVREYLEDRRLVHRDLAARNVLVITDFGLA ErbB2 KVLGSGAFGTVYKVAIKVLEILDEAYVMAGVGPYVSRLLGIQLVTQLMPYGCLLDHVREYLEDVRLVHRDLAARNVLVITDFGLA p110a CRIMSSAKRPLWLIIFKNGDLRQDMLTLQIIRLRMLPYGCLVGLIEVVRSHTIMQIQCKATFI--LGIGDRHNSNIMVHIDFGHF KDR KPLGRGAFGQVIEVAVKMLALMSELKILIHIGLNVVNLLGAMVIVEFCKFGNLSTYLRSFLASRKCIHRDLAARNILLICDFGLA BRAF QRIGSGSFGTVYKVAVKMLAFKNEVGVLRKTRVNILLFMGYAIVTQWCEGSSLYHHLHIYLHAKSIIHRDLKSNNIFLIGDFGLA CDK2 EKIGEGTYGVVYKVALKKITAIREISLLKELNPNIVKLLDVYLVFEFLH-QDLKKFMDAFCHSHRVLHRDLKPQNLLILADFGLA LCK ERLGAGQFGEVWMVAVKSLAFLAEANLMKQLQQRLVRLYAVYIITEYMENGSLVDFLKTFIEERNYIHRDLRAANILVIADFGLA MET EVIGRGHFGCVYHCAVKSLQFLTEGIIMKDFSPNVLSLLGILVVLPYMKHGDLRNFIRNYLASKKFVHRDLAARNCMLVADFGLA p38a SPVGSGAYGSVCAVAVKKLRTYRELRLLKHMKENVIGLLDVYLVTHLMG-ADLNNIVKCYIHSADIIHRDLKPSNLAVILDFGLA
As shown in the cell above, some sequences have missing residues, denoted by "-". Let's plot these sequences as heatmap for a quick visual on conserved regions.
![](https://bohrium.oss-cn-zhangjiakou.aliyuncs.com/article/110155/7b556e1d1ad94beb82abde35750033e9/d858ad558ec946d38726bc3cb8d9dde3.png)
Sequence similarity
Given two kinases, we create functions which account for identity or substitution similarity, as described in the Theory part.
Identity score
We first define a function which compares element-wise characters in two sequences.
Substitution score
We now define the function which is more specific to amino acids grouping and use the biotite
library to retrieve the BLOSUM substitution matrix.
The substitution matrix can be retrieve from biotite
using the following command:
A C D E F G H I K L M N P Q R S T V W Y B Z X * A 4 0 -2 -1 -2 0 -2 -1 -1 -1 -1 -2 -1 -1 -1 1 0 0 -3 -2 -2 -1 0 -4 C 0 9 -3 -4 -2 -3 -3 -1 -3 -1 -1 -3 -3 -3 -3 -1 -1 -1 -2 -2 -3 -3 -2 -4 D -2 -3 6 2 -3 -1 -1 -3 -1 -4 -3 1 -1 0 -2 0 -1 -3 -4 -3 4 1 -1 -4 E -1 -4 2 5 -3 -2 0 -3 1 -3 -2 0 -1 2 0 0 -1 -2 -3 -2 1 4 -1 -4 F -2 -2 -3 -3 6 -3 -1 0 -3 0 0 -3 -4 -3 -3 -2 -2 -1 1 3 -3 -3 -1 -4 G 0 -3 -1 -2 -3 6 -2 -4 -2 -4 -3 0 -2 -2 -2 0 -2 -3 -2 -3 -1 -2 -1 -4 H -2 -3 -1 0 -1 -2 8 -3 -1 -3 -2 1 -2 0 0 -1 -2 -3 -2 2 0 0 -1 -4 I -1 -1 -3 -3 0 -4 -3 4 -3 2 1 -3 -3 -3 -3 -2 -1 3 -3 -1 -3 -3 -1 -4 K -1 -3 -1 1 -3 -2 -1 -3 5 -2 -1 0 -1 1 2 0 -1 -2 -3 -2 0 1 -1 -4 L -1 -1 -4 -3 0 -4 -3 2 -2 4 2 -3 -3 -2 -2 -2 -1 1 -2 -1 -4 -3 -1 -4 M -1 -1 -3 -2 0 -3 -2 1 -1 2 5 -2 -2 0 -1 -1 -1 1 -1 -1 -3 -1 -1 -4 N -2 -3 1 0 -3 0 1 -3 0 -3 -2 6 -2 0 0 1 0 -3 -4 -2 3 0 -1 -4 P -1 -3 -1 -1 -4 -2 -2 -3 -1 -3 -2 -2 7 -1 -2 -1 -1 -2 -4 -3 -2 -1 -2 -4 Q -1 -3 0 2 -3 -2 0 -3 1 -2 0 0 -1 5 1 0 -1 -2 -2 -1 0 3 -1 -4 R -1 -3 -2 0 -3 -2 0 -3 2 -2 -1 0 -2 1 5 -1 -1 -3 -3 -2 -1 0 -1 -4 S 1 -1 0 0 -2 0 -1 -2 0 -2 -1 1 -1 0 -1 4 1 -2 -3 -2 0 0 0 -4 T 0 -1 -1 -1 -2 -2 -2 -1 -1 -1 -1 0 -1 -1 -1 1 5 0 -2 -2 -1 -1 0 -4 V 0 -1 -3 -2 -1 -3 -3 3 -2 1 1 -3 -2 -2 -3 -2 0 4 -3 -1 -3 -2 -1 -4 W -3 -2 -4 -3 1 -2 -2 -3 -3 -2 -1 -4 -4 -2 -3 -3 -2 -3 11 2 -4 -3 -2 -4 Y -2 -2 -3 -2 3 -3 2 -1 -2 -1 -1 -2 -3 -1 -2 -2 -2 -1 2 7 -3 -2 -1 -4 B -2 -3 4 1 -3 -1 0 -3 0 -4 -3 3 -2 0 -1 0 -1 -3 -4 -3 4 1 -1 -4 Z -1 -3 1 4 -3 -2 0 -3 1 -3 -1 0 -1 3 0 0 -1 -2 -3 -2 1 4 -1 -4 X 0 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 -1 -1 0 0 -1 -2 -1 -1 -1 -1 -4 * -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 1
Check for symmetry:
True
Let's perform the translation-rescaling step we discussed in the Theory part.
Let's take a look at the translated and rescaled version of the substitution matrix.
A | C | D | E | F | G | H | I | K | L | M | N | P | Q | R | S | T | V | W | Y | B | Z | X | * | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | 1.00 | 0.39 | 0.22 | 0.35 | 0.22 | 0.45 | 0.20 | 0.38 | 0.35 | 0.38 | 0.35 | 0.22 | 0.32 | 0.35 | 0.35 | 0.62 | 0.47 | 0.50 | 0.09 | 0.21 | 0.25 | 0.38 | 0.82 | 0.00 |
C | 0.39 | 1.00 | 0.09 | 0.00 | 0.18 | 0.09 | 0.08 | 0.29 | 0.09 | 0.29 | 0.28 | 0.09 | 0.08 | 0.09 | 0.09 | 0.29 | 0.28 | 0.29 | 0.14 | 0.17 | 0.10 | 0.10 | 0.32 | 0.00 |
D | 0.22 | 0.09 | 1.00 | 0.63 | 0.10 | 0.30 | 0.27 | 0.11 | 0.32 | 0.00 | 0.11 | 0.50 | 0.29 | 0.42 | 0.21 | 0.45 | 0.32 | 0.11 | 0.00 | 0.10 | 0.89 | 0.56 | 0.55 | 0.00 |
E | 0.35 | 0.00 | 0.63 | 1.00 | 0.11 | 0.21 | 0.38 | 0.12 | 0.56 | 0.12 | 0.22 | 0.42 | 0.30 | 0.67 | 0.44 | 0.47 | 0.33 | 0.24 | 0.09 | 0.20 | 0.59 | 0.94 | 0.58 | 0.00 |
F | 0.22 | 0.18 | 0.10 | 0.11 | 1.00 | 0.10 | 0.27 | 0.45 | 0.11 | 0.45 | 0.42 | 0.10 | 0.00 | 0.11 | 0.11 | 0.22 | 0.21 | 0.34 | 0.41 | 0.67 | 0.11 | 0.11 | 0.55 | 0.00 |
G | 0.45 | 0.09 | 0.30 | 0.21 | 0.10 | 1.00 | 0.18 | 0.00 | 0.21 | 0.00 | 0.11 | 0.40 | 0.19 | 0.21 | 0.21 | 0.45 | 0.21 | 0.11 | 0.16 | 0.10 | 0.34 | 0.22 | 0.55 | 0.00 |
H | 0.20 | 0.08 | 0.27 | 0.38 | 0.27 | 0.18 | 1.00 | 0.10 | 0.29 | 0.10 | 0.19 | 0.46 | 0.17 | 0.38 | 0.38 | 0.31 | 0.19 | 0.10 | 0.15 | 0.52 | 0.41 | 0.41 | 0.50 | 0.00 |
I | 0.38 | 0.29 | 0.11 | 0.12 | 0.45 | 0.00 | 0.10 | 1.00 | 0.12 | 0.75 | 0.59 | 0.11 | 0.11 | 0.12 | 0.12 | 0.25 | 0.35 | 0.88 | 0.09 | 0.32 | 0.12 | 0.12 | 0.61 | 0.00 |
K | 0.35 | 0.09 | 0.32 | 0.56 | 0.11 | 0.21 | 0.29 | 0.12 | 1.00 | 0.24 | 0.33 | 0.42 | 0.30 | 0.56 | 0.67 | 0.47 | 0.33 | 0.24 | 0.09 | 0.20 | 0.47 | 0.59 | 0.58 | 0.00 |
L | 0.38 | 0.29 | 0.00 | 0.12 | 0.45 | 0.00 | 0.10 | 0.75 | 0.24 | 1.00 | 0.71 | 0.11 | 0.11 | 0.24 | 0.24 | 0.25 | 0.35 | 0.62 | 0.18 | 0.32 | 0.00 | 0.12 | 0.61 | 0.00 |
M | 0.35 | 0.28 | 0.11 | 0.22 | 0.42 | 0.11 | 0.19 | 0.59 | 0.33 | 0.71 | 1.00 | 0.21 | 0.20 | 0.44 | 0.33 | 0.35 | 0.33 | 0.59 | 0.26 | 0.30 | 0.12 | 0.35 | 0.58 | 0.00 |
N | 0.22 | 0.09 | 0.50 | 0.42 | 0.10 | 0.40 | 0.46 | 0.11 | 0.42 | 0.11 | 0.21 | 1.00 | 0.19 | 0.42 | 0.42 | 0.56 | 0.42 | 0.11 | 0.00 | 0.19 | 0.78 | 0.45 | 0.55 | 0.00 |
P | 0.32 | 0.08 | 0.29 | 0.30 | 0.00 | 0.19 | 0.17 | 0.11 | 0.30 | 0.11 | 0.20 | 0.19 | 1.00 | 0.30 | 0.20 | 0.32 | 0.30 | 0.21 | 0.00 | 0.09 | 0.21 | 0.32 | 0.35 | 0.00 |
Q | 0.35 | 0.09 | 0.42 | 0.67 | 0.11 | 0.21 | 0.38 | 0.12 | 0.56 | 0.24 | 0.44 | 0.42 | 0.30 | 1.00 | 0.56 | 0.47 | 0.33 | 0.24 | 0.17 | 0.30 | 0.47 | 0.82 | 0.58 | 0.00 |
R | 0.35 | 0.09 | 0.21 | 0.44 | 0.11 | 0.21 | 0.38 | 0.12 | 0.67 | 0.24 | 0.33 | 0.42 | 0.20 | 0.56 | 1.00 | 0.35 | 0.33 | 0.12 | 0.09 | 0.20 | 0.35 | 0.47 | 0.58 | 0.00 |
S | 0.62 | 0.29 | 0.45 | 0.47 | 0.22 | 0.45 | 0.31 | 0.25 | 0.47 | 0.25 | 0.35 | 0.56 | 0.32 | 0.47 | 0.35 | 1.00 | 0.59 | 0.25 | 0.09 | 0.21 | 0.50 | 0.50 | 0.82 | 0.00 |
T | 0.47 | 0.28 | 0.32 | 0.33 | 0.21 | 0.21 | 0.19 | 0.35 | 0.33 | 0.35 | 0.33 | 0.42 | 0.30 | 0.33 | 0.33 | 0.59 | 1.00 | 0.47 | 0.17 | 0.20 | 0.35 | 0.35 | 0.77 | 0.00 |
V | 0.50 | 0.29 | 0.11 | 0.24 | 0.34 | 0.11 | 0.10 | 0.88 | 0.24 | 0.62 | 0.59 | 0.11 | 0.21 | 0.24 | 0.12 | 0.25 | 0.47 | 1.00 | 0.09 | 0.32 | 0.12 | 0.25 | 0.61 | 0.00 |
W | 0.09 | 0.14 | 0.00 | 0.09 | 0.41 | 0.16 | 0.15 | 0.09 | 0.09 | 0.18 | 0.26 | 0.00 | 0.00 | 0.17 | 0.09 | 0.09 | 0.17 | 0.09 | 1.00 | 0.47 | 0.00 | 0.09 | 0.30 | 0.00 |
Y | 0.21 | 0.17 | 0.10 | 0.20 | 0.67 | 0.10 | 0.52 | 0.32 | 0.20 | 0.32 | 0.30 | 0.19 | 0.09 | 0.30 | 0.20 | 0.21 | 0.20 | 0.32 | 0.47 | 1.00 | 0.11 | 0.21 | 0.52 | 0.00 |
B | 0.25 | 0.10 | 0.89 | 0.59 | 0.11 | 0.34 | 0.41 | 0.12 | 0.47 | 0.00 | 0.12 | 0.78 | 0.21 | 0.47 | 0.35 | 0.50 | 0.35 | 0.12 | 0.00 | 0.11 | 1.00 | 0.62 | 0.61 | 0.00 |
Z | 0.38 | 0.10 | 0.56 | 0.94 | 0.11 | 0.22 | 0.41 | 0.12 | 0.59 | 0.12 | 0.35 | 0.45 | 0.32 | 0.82 | 0.47 | 0.50 | 0.35 | 0.25 | 0.09 | 0.21 | 0.62 | 1.00 | 0.61 | 0.00 |
X | 0.82 | 0.32 | 0.55 | 0.58 | 0.55 | 0.55 | 0.50 | 0.61 | 0.58 | 0.61 | 0.58 | 0.55 | 0.35 | 0.58 | 0.58 | 0.82 | 0.77 | 0.61 | 0.30 | 0.52 | 0.61 | 0.61 | 1.00 | 0.00 |
* | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 |
Let's define a function that calculates the substitution scores between two sequences (will make use of the previously defined function).
Kinase similarity
Given two kinases, we create a function which computes the sequence similarity between them using one of the two measures, the identity or the substitution.
Let's look at the sequence similarity between two kinases (see also Figure 2):
The sequences are: EGFR : KVLGSGAFGTVYKVAIKELEILDEAYVMASVDPHVCRLLGIQLITQLMPFGCLLDYVREYLEDRRLVHRDLAARNVLVITDFGLA MET : EVIGRGHFGCVYHCAVKSLQFLTEGIIMKDFSPNVLSLLGILVVLPYMKHGDLRNFIRNYLASKKFVHRDLAARNCMLVADFGLA
Pocket sequence similarity between EGFR and MET kinases: 0.46 using identity.
Pocket sequence similarity between EGFR and MET kinases: 0.71 using substitution.
Figure 2: Sequences and sequence similarity between the kinases EGFR and MET.
We can also look at self-similarity:
Pocket sequence similarity between EGFR itself: 1.00 using identity.
Pocket sequence similarity between EGFR itself: 1.00 using substitution.
As expected, the similarity between a kinase and itself leads to the highest possible score:
Visualize similarity as kinase matrix
We visualize the similarity matrices using identity and substitution:
Kinase similarity matrix: Identity
EGFR | ErbB2 | p110a | KDR | BRAF | CDK2 | LCK | MET | p38a | |
---|---|---|---|---|---|---|---|---|---|
EGFR | 1.000000 | 0.894118 | 0.117647 | 0.470588 | 0.376471 | 0.317647 | 0.447059 | 0.458824 | 0.388235 |
ErbB2 | 0.894118 | 1.000000 | 0.117647 | 0.435294 | 0.400000 | 0.329412 | 0.423529 | 0.470588 | 0.400000 |
p110a | 0.117647 | 0.117647 | 1.000000 | 0.152941 | 0.152941 | 0.105882 | 0.141176 | 0.105882 | 0.141176 |
KDR | 0.470588 | 0.435294 | 0.152941 | 1.000000 | 0.400000 | 0.341176 | 0.435294 | 0.470588 | 0.388235 |
BRAF | 0.376471 | 0.400000 | 0.152941 | 0.400000 | 1.000000 | 0.329412 | 0.388235 | 0.376471 | 0.376471 |
CDK2 | 0.317647 | 0.329412 | 0.105882 | 0.341176 | 0.329412 | 1.000000 | 0.376471 | 0.364706 | 0.470588 |
LCK | 0.447059 | 0.423529 | 0.141176 | 0.435294 | 0.388235 | 0.376471 | 1.000000 | 0.400000 | 0.388235 |
MET | 0.458824 | 0.470588 | 0.105882 | 0.470588 | 0.376471 | 0.364706 | 0.400000 | 1.000000 | 0.364706 |
p38a | 0.388235 | 0.400000 | 0.141176 | 0.388235 | 0.376471 | 0.470588 | 0.388235 | 0.364706 | 1.000000 |
EGFR | ErbB2 | p110a | KDR | BRAF | CDK2 | LCK | MET | p38a | |
---|---|---|---|---|---|---|---|---|---|
EGFR | 1.000 | 0.894 | 0.118 | 0.471 | 0.376 | 0.318 | 0.447 | 0.459 | 0.388 |
ErbB2 | 0.894 | 1.000 | 0.118 | 0.435 | 0.400 | 0.329 | 0.424 | 0.471 | 0.400 |
p110a | 0.118 | 0.118 | 1.000 | 0.153 | 0.153 | 0.106 | 0.141 | 0.106 | 0.141 |
KDR | 0.471 | 0.435 | 0.153 | 1.000 | 0.400 | 0.341 | 0.435 | 0.471 | 0.388 |
BRAF | 0.376 | 0.400 | 0.153 | 0.400 | 1.000 | 0.329 | 0.388 | 0.376 | 0.376 |
CDK2 | 0.318 | 0.329 | 0.106 | 0.341 | 0.329 | 1.000 | 0.376 | 0.365 | 0.471 |
LCK | 0.447 | 0.424 | 0.141 | 0.435 | 0.388 | 0.376 | 1.000 | 0.400 | 0.388 |
MET | 0.459 | 0.471 | 0.106 | 0.471 | 0.376 | 0.365 | 0.400 | 1.000 | 0.365 |
p38a | 0.388 | 0.400 | 0.141 | 0.388 | 0.376 | 0.471 | 0.388 | 0.365 | 1.000 |
Kinase similarity matrix: Substitution
EGFR | ErbB2 | p110a | KDR | BRAF | CDK2 | LCK | MET | p38a | |
---|---|---|---|---|---|---|---|---|---|
EGFR | 1.000000 | 0.940963 | 0.427062 | 0.716047 | 0.655028 | 0.648447 | 0.711321 | 0.711258 | 0.644644 |
ErbB2 | 0.940963 | 1.000000 | 0.413638 | 0.702118 | 0.654600 | 0.630308 | 0.685075 | 0.697967 | 0.635173 |
p110a | 0.427062 | 0.413638 | 1.000000 | 0.422705 | 0.436459 | 0.423994 | 0.451336 | 0.393699 | 0.431194 |
KDR | 0.716047 | 0.702118 | 0.422705 | 1.000000 | 0.671268 | 0.653379 | 0.687648 | 0.713806 | 0.653444 |
BRAF | 0.655028 | 0.654600 | 0.436459 | 0.671268 | 1.000000 | 0.646755 | 0.672933 | 0.638158 | 0.637912 |
CDK2 | 0.648447 | 0.630308 | 0.423994 | 0.653379 | 0.646755 | 1.000000 | 0.681253 | 0.656025 | 0.723093 |
LCK | 0.711321 | 0.685075 | 0.451336 | 0.687648 | 0.672933 | 0.681253 | 1.000000 | 0.690879 | 0.662581 |
MET | 0.711258 | 0.697967 | 0.393699 | 0.713806 | 0.638158 | 0.656025 | 0.690879 | 1.000000 | 0.629355 |
p38a | 0.644644 | 0.635173 | 0.431194 | 0.653444 | 0.637912 | 0.723093 | 0.662581 | 0.629355 | 1.000000 |
EGFR | ErbB2 | p110a | KDR | BRAF | CDK2 | LCK | MET | p38a | |
---|---|---|---|---|---|---|---|---|---|
EGFR | 1.000 | 0.941 | 0.427 | 0.716 | 0.655 | 0.648 | 0.711 | 0.711 | 0.645 |
ErbB2 | 0.941 | 1.000 | 0.414 | 0.702 | 0.655 | 0.630 | 0.685 | 0.698 | 0.635 |
p110a | 0.427 | 0.414 | 1.000 | 0.423 | 0.436 | 0.424 | 0.451 | 0.394 | 0.431 |
KDR | 0.716 | 0.702 | 0.423 | 1.000 | 0.671 | 0.653 | 0.688 | 0.714 | 0.653 |
BRAF | 0.655 | 0.655 | 0.436 | 0.671 | 1.000 | 0.647 | 0.673 | 0.638 | 0.638 |
CDK2 | 0.648 | 0.630 | 0.424 | 0.653 | 0.647 | 1.000 | 0.681 | 0.656 | 0.723 |
LCK | 0.711 | 0.685 | 0.451 | 0.688 | 0.673 | 0.681 | 1.000 | 0.691 | 0.663 |
MET | 0.711 | 0.698 | 0.394 | 0.714 | 0.638 | 0.656 | 0.691 | 1.000 | 0.629 |
p38a | 0.645 | 0.635 | 0.431 | 0.653 | 0.638 | 0.723 | 0.663 | 0.629 | 1.000 |
When we compare the matrices calculated based on the identity and substitution score, the overall pattern is similar, while the values are generally higher using the substitution score.
Note: For all downstream analysis, we will only consider the kinase similarity matrix calculated based on the substitution matrix.
Save kinase similarity matrix
Kinase distance matrix
Since all entries are between and , the similarity matrix is mapped to a distance matrix:
The values of the similarity matrix lie between: 0.39 and 1.00
EGFR | ErbB2 | p110a | KDR | BRAF | CDK2 | LCK | MET | p38a | |
---|---|---|---|---|---|---|---|---|---|
EGFR | 0.000 | 0.059 | 0.573 | 0.284 | 0.345 | 0.352 | 0.289 | 0.289 | 0.355 |
ErbB2 | 0.059 | 0.000 | 0.586 | 0.298 | 0.345 | 0.370 | 0.315 | 0.302 | 0.365 |
p110a | 0.573 | 0.586 | 0.000 | 0.577 | 0.564 | 0.576 | 0.549 | 0.606 | 0.569 |
KDR | 0.284 | 0.298 | 0.577 | 0.000 | 0.329 | 0.347 | 0.312 | 0.286 | 0.347 |
BRAF | 0.345 | 0.345 | 0.564 | 0.329 | 0.000 | 0.353 | 0.327 | 0.362 | 0.362 |
CDK2 | 0.352 | 0.370 | 0.576 | 0.347 | 0.353 | 0.000 | 0.319 | 0.344 | 0.277 |
LCK | 0.289 | 0.315 | 0.549 | 0.312 | 0.327 | 0.319 | 0.000 | 0.309 | 0.337 |
MET | 0.289 | 0.302 | 0.606 | 0.286 | 0.362 | 0.344 | 0.309 | 0.000 | 0.371 |
p38a | 0.355 | 0.365 | 0.569 | 0.347 | 0.362 | 0.277 | 0.337 | 0.371 | 0.000 |
Save kinase distance matrix
Discussion
In this talktorial, we investigate how sequences can be used to measure similarity between kinases. The focus is set on the pocket sequence, which is retrieved from KLIFS. Sequence similarity can be assessed using two scores: 1. the identity, which treats all amino acids uniformly, and 2. the substitution, which takes into account the rate of change of residues over evolutionary time.
The kinase similarity matrix above will be reloaded in Talktorial T028, where we compare kinase similarities from different perspectives, including the pocket sequence perspective we have talked about in this talktorial.
Reference
https://github.com/volkamerlab/teachopencadd
Reprint statement
Original title: Kinase similarity: Sequence
Note: This talktorial is a part of TeachOpenCADD, a platform that aims to teach domain-specific skills and to provide pipeline templates as starting points for research projects.
Authors:
- Talia B. Kimber, 2021, Volkamer lab, Charité
- Dominique Sydow, 2021, Volkamer lab, Charité
- Andrea Volkamer, 2021, Volkamer lab, Charité
![](https://cdn1.deepmd.net/static/img/d7d9741bda38a158-957c-4877-942f-4bf6f81fcc63.png?x-oss-process=image/resize,w_100,m_lfit)
![](https://cdn1.deepmd.net/bohrium/web/static/images/level-v2-1.png?x-oss-process=image/resize,w_50,m_lfit)
![](https://cdn1.deepmd.net/static/img/d7d9741bda38a158-957c-4877-942f-4bf6f81fcc63.png?x-oss-process=image/resize,w_100,m_lfit)
![](https://cdn1.deepmd.net/bohrium/web/static/images/level-v2-1.png?x-oss-process=image/resize,w_50,m_lfit)