空间站广场

论文

Notebooks

比赛

课程

Apps

我的主页

我的Notebooks

我的论文库

我的足迹

我的工作空间

任务

节点

文件

数据集

镜像

项目

数据库

公开

WGCNA：一个用于分析基因表达数据的 R 包

生物信息学

R包

基因共表达分析

核心基因

生物信息学R包基因共表达分析核心基因

孙楠

发布于 2023-11-21

推荐镜像 :bio-r-notebook:v1

推荐机型 :c4_m16_cpu

数据集

WGCNA依赖数据(v1)

WGCNA：一个用于分析基因表达数据的 R 包

代码

文本

WGCNA，全称为 Weighted Gene Co-Expression Network Analysis，是一种用于分析基因表达数据的生物信息学方法。它的主要目标是发现基因之间的共表达模式——两者同时上调表达，还是同时下调表达；并将这些基因分组成具有生物学意义的共表达模块——例如关于调控花青素合成的基因可能就会聚类在同一个模块里面，关于调控叶绿素合成则可能会聚类在另一个模块里面。WGCNA 还可用于识别与特定生物学过程、疾病状态或实验条件相关的基因网络，并且找到其中的 hub gene。

原始论文：Peter Langfelder, Horvath Steve. WGCNA: an R package for weighted correlation network analysis. BMC bioinformatics 9.1 (2008): 1-13. https://doi.org/10.1186/1471-2105-9-559.

代码

文本

📖 上手指南
本文档可在 Bohrium Notebook 上直接运行。你可以点击界面上方按钮开始连接，选择 `bio-r-notebook:v1` 镜像和 `c4_m16_cpu` 节点配置，稍等片刻选择 `R kernel` 即可运行。

代码

文本

[1]

library(WGCNA)

Loading required package: dynamicTreeCut

Loading required package: fastcluster


Attaching package: ‘fastcluster’


The following object is masked from ‘package:stats’:

    hclust





Attaching package: ‘WGCNA’


The following object is masked from ‘package:stats’:

    cor

代码

文本

一、数据读入和预处理

代码

文本

1. 读入基因表达数据

代码

文本

[2]

femData = read.csv("/bohr/wgcna-ss71/v1/LiverFemale3600.csv")

datExpr0 = as.data.frame(t(femData[, -c(1:8)])) # 提取出表达量的数据，删去不需要的数据重新生成矩阵

colnames(datExpr0) = femData$substanceBXH # gene name

rownames(datExpr0) = names(femData)[-c(1:8)] # sample name

head(datExpr0) # 行为 sample，列为 gene

A data.frame: 6 × 3600
	MMT00000044	MMT00000046	MMT00000051	MMT00000076	MMT00000080	MMT00000102	MMT00000149	MMT00000159	MMT00000207	MMT00000212	⋯	MMT00082822	MMT00082828	MMT00082829	MMT00082832	MMT00082847	MMT00082850	MMT00082869	MMT00082877	MMT00082899	MMT00082906
	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	⋯	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>
F2_2	-0.0181000	-0.0773	-0.02260000	-0.00924	-0.04870000	0.17600000	0.07680000	-0.14800000	0.06870000	0.06090000	⋯	0.0135000	-0.15400000	-0.0218000	0.0310000	0.1290000	0.0467000	0.00991000	0.0291000	-0.00927	0.0436000
F2_3	0.0642000	-0.0297	0.06170000	-0.14500	0.05820000	-0.18900000	0.18600000	0.17700000	0.10100000	0.05570000	⋯	-0.0097100	-0.07410000	0.0900000	0.0106000	0.1130000	-0.0252000	0.03190000	0.0408000	-0.12100	0.0827000
F2_14	0.0000644	0.1120	-0.12900000	0.02870	-0.04830000	-0.06500000	0.21400000	-0.13200000	0.10900000	0.19100000	⋯	0.0709000	-0.13900000	0.0277000	-0.1310000	0.2550000	-0.1230000	0.08800000	0.0892000	-0.11400	-0.0872000
F2_15	-0.0580000	-0.0589	0.08710000	-0.04390	-0.03710000	-0.00846000	0.12000000	0.10700000	-0.00858000	-0.12100000	⋯	-0.0313000	-0.07250000	0.0178000	0.0882000	0.0790000	0.0002760	-0.04820000	0.0493000	-0.05010	-0.0390000
F2_19	0.0483000	0.0443	-0.11500000	0.00425	0.02510000	-0.00574000	0.02100000	-0.11900000	0.10500000	0.05410000	⋯	0.0695000	-0.11500000	0.0618000	0.2950000	0.1270000	-0.0560000	-0.02890000	-0.0389000	0.00718	0.0710000
F2_20	-0.1519741	-0.0938	-0.06502607	-0.23610	0.08504274	-0.01807182	0.06222751	-0.05497686	-0.02441415	0.06343181	⋯	0.1743492	-0.09405315	0.1176646	0.1161963	0.1180381	-0.1171272	-0.09774204	-0.0745188	0.31857	0.2047701

代码

文本

2. 检查缺失值和识别离群值（异常值）

查看是否有缺失值：

代码

文本

[3]

gsg = goodSamplesGenes(datExpr0, verbose = 3)

names(gsg)

gsg$allOK

 Flagging genes and samples with too many missing values...
  ..step 1

'goodGenes'
'goodSamples'
'allOK'

TRUE

代码

文本

如果 `gsg $a llO K ‘ 的结果为 TR U E ，证明没有缺失值，可以直接下一步。如果为 F A L SE ，则需要用下面函数删除缺失值： ‘‘‘ r i f (! g s g$ allOK) {

Optionally, print the gene and sample names that were removed:

if (sum(!gsg $g oo d G e n es) > 0) p r in tFl u s h (p a s t e (" R e m o v in gg e n es : ", p a s t e (nam es (d a tE x p r 0) [! g s g$ goodGenes], collapse = ", "))); if (sum(!gsg $g oo d S am pl es) > 0) p r in tFl u s h (p a s t e (" R e m o v in g s am pl es : ", p a s t e (ro w nam es (d a tE x p r 0) [! g s g$ goodSamples], collapse = ", ")));

Remove the offending genes and samples from the data:

datExpr0 = datExpr0[gsg $g oo d S am pl es, g s g$ goodGenes] }

    
<font size = '3'> 聚类所有样本，观察是否有离群值或异常值：

代码

文本

[4]

sampleTree = hclust(dist(datExpr0), method = "average")

par(cex = 0.6)

par(mar = c(0,4,2,0))

plot(sampleTree, main = "Sample clustering to detect outliers", sub="", xlab="", cex.lab = 1.5, cex.axis = 1.5, cex.main = 2)

abline(h = 15, col = "red") # 划定需要剪切的枝长

clust = cutreeStatic(sampleTree, cutHeight = 15, minSize = 10)

table(clust)

代码

文本

有一个离群值，删除离群样本：

代码

文本

[5]

keepSamples = (clust==1) # 保留非离群(clust==1)的样本

datExpr = datExpr0[keepSamples, ] # 去除离群值后的数据

head(datExpr)

A data.frame: 6 × 3600
	MMT00000044	MMT00000046	MMT00000051	MMT00000076	MMT00000080	MMT00000102	MMT00000149	MMT00000159	MMT00000207	MMT00000212	⋯	MMT00082822	MMT00082828	MMT00082829	MMT00082832	MMT00082847	MMT00082850	MMT00082869	MMT00082877	MMT00082899	MMT00082906
	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	⋯	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>
F2_2	-0.0181000	-0.0773	-0.02260000	-0.00924	-0.04870000	0.17600000	0.07680000	-0.14800000	0.06870000	0.06090000	⋯	0.0135000	-0.15400000	-0.0218000	0.0310000	0.1290000	0.0467000	0.00991000	0.0291000	-0.00927	0.0436000
F2_3	0.0642000	-0.0297	0.06170000	-0.14500	0.05820000	-0.18900000	0.18600000	0.17700000	0.10100000	0.05570000	⋯	-0.0097100	-0.07410000	0.0900000	0.0106000	0.1130000	-0.0252000	0.03190000	0.0408000	-0.12100	0.0827000
F2_14	0.0000644	0.1120	-0.12900000	0.02870	-0.04830000	-0.06500000	0.21400000	-0.13200000	0.10900000	0.19100000	⋯	0.0709000	-0.13900000	0.0277000	-0.1310000	0.2550000	-0.1230000	0.08800000	0.0892000	-0.11400	-0.0872000
F2_15	-0.0580000	-0.0589	0.08710000	-0.04390	-0.03710000	-0.00846000	0.12000000	0.10700000	-0.00858000	-0.12100000	⋯	-0.0313000	-0.07250000	0.0178000	0.0882000	0.0790000	0.0002760	-0.04820000	0.0493000	-0.05010	-0.0390000
F2_19	0.0483000	0.0443	-0.11500000	0.00425	0.02510000	-0.00574000	0.02100000	-0.11900000	0.10500000	0.05410000	⋯	0.0695000	-0.11500000	0.0618000	0.2950000	0.1270000	-0.0560000	-0.02890000	-0.0389000	0.00718	0.0710000
F2_20	-0.1519741	-0.0938	-0.06502607	-0.23610	0.08504274	-0.01807182	0.06222751	-0.05497686	-0.02441415	0.06343181	⋯	0.1743492	-0.09405315	0.1176646	0.1161963	0.1180381	-0.1171272	-0.09774204	-0.0745188	0.31857	0.2047701

代码

文本

3. 读入临床表征数据

代码

文本

[6]

traitData = read.csv("/bohr/wgcna-ss71/v1/ClinicalTraits.csv") # 行为 sample，列为 info

allTraits = traitData[, -c(31, 16)] # 删除不需要的数据

allTraits = allTraits[, c(2, 11:36) ] # 只保留数值型数据

head(allTraits)

A data.frame: 6 × 27
	Mice	weight_g	length_cm	ab_fat	other_fat	total_fat	X100xfat_weight	Trigly	Total_Chol	HDL_Chol	⋯	Leptin_pg_ml	Adiponectin	Aortic.lesions	Aneurysm	Aortic_cal_M	Aortic_cal_L	CoronaryArtery_Cal	Myocardial_cal	BMD_all_limbs	BMD_femurs_only
	<chr>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<int>	<int>	<int>	⋯	<dbl>	<dbl>	<int>	<int>	<int>	<int>	<int>	<int>	<dbl>	<dbl>
1	F2_290	36.9	9.9	2.53	2.26	4.79	12.981030	53	1167	50	⋯	245462.00	11.274	496250	16	0	17	0	0	NA	NA
2	F2_291	48.5	10.7	2.90	2.97	5.87	12.103093	61	1230	32	⋯	84420.88	7.099	NA	16	4	0	2	4	0.0548	0.07730
3	F2_292	45.7	10.4	1.04	2.31	3.35	7.330416	41	1285	81	⋯	105889.76	5.795	218500	0	0	11	0	0	0.0554	0.08065
4	F2_293	50.3	10.9	0.91	1.89	2.80	5.566600	271	1299	64	⋯	100398.68	5.495	61250	0	0	0	0	236	0.0597	0.08680
5	F2_294	44.8	9.8	1.22	2.47	3.69	8.236607	114	1410	50	⋯	130846.30	6.868	243750	12	10	0	0	0	NA	NA
6	F2_295	39.2	10.2	3.06	2.49	5.55	14.158163	72	1533	18	⋯	75166.22	17.328	104250	17	2	0	0	0	0.0557	0.07700

代码

文本

用样本名字将临床表征数据和基因表达数据进行匹配：

代码

文本

[7]

femaleSamples = rownames(datExpr)

traitRows = match(femaleSamples, allTraits$Mice)

datTraits = allTraits[traitRows, -1]

rownames(datTraits) = allTraits[traitRows, 1]

head(datTraits)

A data.frame: 6 × 26
	weight_g	length_cm	ab_fat	other_fat	total_fat	X100xfat_weight	Trigly	Total_Chol	HDL_Chol	UC	⋯	Leptin_pg_ml	Adiponectin	Aortic.lesions	Aneurysm	Aortic_cal_M	Aortic_cal_L	CoronaryArtery_Cal	Myocardial_cal	BMD_all_limbs	BMD_femurs_only
	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<int>	<int>	<int>	<int>	⋯	<dbl>	<dbl>	<int>	<int>	<int>	<int>	<int>	<int>	<dbl>	<dbl>
F2_2	38.0	10.5	3.81	2.78	6.59	17.342105	14	1646	34	668	⋯	NA	NA	224500	56	5	0	0	0	NA	NA
F2_3	33.5	10.8	1.70	2.05	3.75	11.194030	109	1216	27	402	⋯	15148.76	14.339	296250	8	4	NA	0	0	NA	NA
F2_14	33.9	10.0	1.29	1.67	2.96	8.731563	2	834	17	354	⋯	6188.74	15.439	486313	27	12	NA	1	8	NA	NA
F2_15	44.3	10.3	3.62	3.34	6.96	15.711061	71	1565	41	536	⋯	18400.26	11.124	180750	0	0	NA	0	4	NA	NA
F2_19	32.9	9.7	2.08	1.85	3.93	11.945289	55	1060	41	411	⋯	8438.70	16.842	113000	0	0	NA	0	0	NA	NA
F2_20	44.8	10.3	3.72	3.20	6.92	15.446429	34	1172	39	448	⋯	41801.54	13.498	166750	6	0	NA	0	0	NA	NA

代码

文本

可视化临床表征数据与基因表达数据的联系，重构样本聚类树：

代码

文本

[8]

sampleTree2 = hclust(dist(datExpr), method = "average")

traitColors = numbers2colors(datTraits, signed = FALSE) # 颜色代表关联度

plotDendroAndColors(sampleTree2, traitColors,

groupLabels = names(datTraits),

main = "Sample dendrogram and trait heatmap")

代码

文本

图片结果解释了临床数据和基因表达量的关联程度，颜色越深，代表这个表型数据与这个样本的基因表达量关系越密切。

代码

文本

[9]

save(datExpr, datTraits, file = "female_liver_01.RData")

代码

文本

二、构建表达网络

是否构建正确的表达网络对后期模块的划分和关联表型数据筛选 hub gene 至关重要。挑选软阈值是构建网络拓扑分析的关键，选择软阈值是基于近无尺度拓扑标准的。其次就是构建 TOM 矩阵或者邻接矩阵的时候运行大数据无法成功。

代码

文本

[10]

lnames = load(file = "female_liver_01.RData")

lnames

'datExpr'
'datTraits'

代码

文本

1. 构建自动化网络和检测模块

选择软阈值：

代码

文本

[11]

powers = c(c(1:10), seq(from = 12, to=20, by=2))

sft = pickSoftThreshold(datExpr, powerVector = powers, verbose = 5)

par(mfrow = c(1,2))

cex1 = 0.9

pickSoftThreshold: will use block size 3600.
 pickSoftThreshold: calculating connectivity for given powers...
   ..working on genes 1 through 3600 of 3600
Warning message:
“executing %dopar% sequentially: no parallel backend registered”
   Power SFT.R.sq  slope truncated.R.sq mean.k. median.k. max.k.
1      1   0.0278  0.345          0.456  747.00  762.0000 1210.0
2      2   0.1260 -0.597          0.843  254.00  251.0000  574.0
3      3   0.3400 -1.030          0.972  111.00  102.0000  324.0
4      4   0.5060 -1.420          0.973   56.50   47.2000  202.0
5      5   0.6810 -1.720          0.940   32.20   25.1000  134.0
6      6   0.9020 -1.500          0.962   19.90   14.5000   94.8
7      7   0.9210 -1.670          0.917   13.20    8.6800   84.1
8      8   0.9040 -1.720          0.876    9.25    5.3900   76.3
9      9   0.8590 -1.700          0.836    6.80    3.5600   70.5
10    10   0.8330 -1.660          0.831    5.19    2.3800   65.8
11    12   0.8530 -1.480          0.911    3.33    1.1500   58.1
12    14   0.8760 -1.380          0.949    2.35    0.5740   51.9
13    16   0.9070 -1.300          0.970    1.77    0.3090   46.8
14    18   0.9120 -1.240          0.973    1.39    0.1670   42.5
15    20   0.9310 -1.210          0.977    1.14    0.0951   38.7

代码

文本

无标度拓扑拟合指数：

代码

文本

[12]

plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],

xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n",

main = paste("Scale independence"))

text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],

labels=powers,cex=cex1,col="red")

abline(h=0.90,col="red") #查看位于0.9以上的点，可以改变高度值

代码

文本

选择在 $R^{2} = 0.9$ 以上的对应的值作为软阈值，上图 6 是第一个达到 0.9 的数值，考虑 6 作为软阈值。

平均连接度：

代码

文本

[13]

plot(sft$fitIndices[,1], sft$fitIndices[,5],

xlab="Soft Threshold (power)", ylab="Mean Connectivity", type="n",

main = paste("Mean connectivity"))

text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")

代码

文本

从上图可以看出，数值为 6 的时候，图形开始持平，说明软阈值为 6 时，网络的连通性好。

同时运行下面的代码，如果有合适的软阈值，系统会自动推荐出来；如果显示的结果为 NA，则表明系统无法给出合适的软阈值，这时候就需要自己挑选软阈值。手动挑选软阈值的大致规则如上图形中所述：

代码

文本

[14]

sft$powerEstimate

代码

文本

2. 一步法构建网络和模块检测

此外还有逐步法和分步法，这三种方法的主要区别是：

一步法：适合处理较少的数据量，方便快捷，自动化程度高

逐步法：适合处理适中的数据量，可以自定义参数

分步法：适合处理较大的数据量（5000个以上基因），需要分不同的block划分模块，自定义参数

代码

文本

[15]

net = blockwiseModules(datExpr, power = 6, # power = 6：刚才选择的软阈值

TOMType = "unsigned", minModuleSize = 30, # minModuleSize = 30：模块中最少的基因数

reassignThreshold = 0, mergeCutHeight = 0.25, # mergeCutHeight = 0.25：模块合并阈值，阈值越大，模块越少（重要）

numericLabels = TRUE, pamRespectsDendro = FALSE,

saveTOMs = TRUE,

saveTOMFileBase = "femaleMouseTOM",

verbose = 3) # saveTOMs = TRUE,saveTOMFileBase = "femaleMouseTOM"保存 TOM 矩阵，名字为 "femaleMouseTOM"

 Calculating module eigengenes block-wise from all genes
   Flagging genes and samples with too many missing values...
    ..step 1
Cluster size 3600 broken into 2108 1492 
Cluster size 2108 broken into 1126 982 
Done cluster 1126 
Done cluster 982 
Done cluster 2108 
Done cluster 1492 
 ..Working on block 1 .
    TOM calculation: adjacency..
    ..will not use multithreading.
     Fraction of slow calculations: 0.396405
    ..connectivity..
    ..matrix multiplication (system BLAS)..
    ..normalization..
    ..done.
   ..saving TOM for block 1 into file femaleMouseTOM-block.1.RData
 ....clustering..
 ....detecting modules..
 ....calculating module eigengenes..
 ....checking kME in modules..
     ..removing 1 genes from module 1 because their KME is too low.
     ..removing 1 genes from module 7 because their KME is too low.
     ..removing 1 genes from module 8 because their KME is too low.
     ..removing 1 genes from module 21 because their KME is too low.
 ..merging modules that are too close..
     mergeCloseModules: Merging modules whose distance is less than 0.25
       Calculating new MEs...

代码

文本

查看划分的模块数和每个模块里面包含的基因个数：

代码

文本

[16]

table(net$colors)

  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18 
 99 609 460 409 316 312 221 211 157 123 106 100  94  91  77  76  58  47  34

代码

文本

一共可以分为 18 个模块，第二行是每个模块对应的基因数，有多到少。从模块 1 开始，基因数逐渐减少。模块 0 是无法识别的基因数。

模块标识的层次聚类树状图，使用以下代码将树状图与颜色分配一起显示：

代码

文本

[17]

mergedColors = labels2colors(net$colors)

plotDendroAndColors(net$dendrograms[[1]], mergedColors[net$blockGenes[[1]]],

"Module colors",

dendroLabels = FALSE, hang = 0.03,

addGuide = TRUE, guideHang = 0.05)

代码

文本

保存分配模块和模块包含的基因信息：

代码

文本

[18]

moduleLabels = net$colors

moduleColors = labels2colors(net$colors)

MEs = net$MEs;

geneTree = net$dendrograms[[1]];

save(MEs, moduleLabels, moduleColors, geneTree,

file = "female_liver_02.RData")

代码

文本

三、模块与表型数据关联并识别重要基因

代码

文本

[19]

lnames = load(file = "female_liver_01.RData")

lnames

lnames = load(file = "female_liver_02.RData")

lnames

'datExpr'
'datTraits'

'MEs'
'moduleLabels'
'moduleColors'
'geneTree'

代码

文本

1. 模块-表型数据关联

这个分析将识别与表型数据显著相关的模块。已经有每个模块的eigengene，只需要将eigengene与外部数据相关联，寻找重要的关联:

代码

文本

[20]

nGenes = ncol(datExpr)

nSamples = nrow(datExpr)

# 重新计算带有颜色标签的模块

MEs0 = moduleEigengenes(datExpr, moduleColors)$eigengenes

MEs = orderMEs(MEs0)

moduleTraitCor = cor(MEs, datTraits, use = "p")

moduleTraitPvalue = corPvalueStudent(moduleTraitCor, nSamples)

# 通过相关值对每个关联进行颜色编码

# 展示模块与表型数据的相关系数和 P 值

textMatrix = paste(signif(moduleTraitCor, 2), "\n(",

signif(moduleTraitPvalue, 1), ")", sep = "")

dim(textMatrix) = dim(moduleTraitCor)

par(mar = c(6, 8.5, 3, 3))

# 用热图的形式展示相关系数

labeledHeatmap(Matrix = moduleTraitCor,

xLabels = names(datTraits),

yLabels = names(MEs),

ySymbols = names(MEs),

colorLabels = FALSE,

colors = greenWhiteRed(50),

textMatrix = textMatrix,

setStdMargins = FALSE,

cex.text = 0.5,

zlim = c(-1,1),

main = paste("Module-trait relationships"))

代码

文本

颜色越红的模块表示与表型性状与该模块的基因高度正相关，颜色越绿表示高度负相关。

看到棕色模块与体重的相关性非常高，下面探讨这个模块中的基因与体重的关系。

2. 基因与表型数据的关系、重要模块：基因显著性和模块成员

用基因的显著性 GS 定义为基因与性状的相关性（绝对值），以定量单个基因与我们感兴趣的性状的关联。对于每个模块，我们将用模块成员 MM 的定量测定定义为模块 eigengene 和基因表达特征的相关性。这样能够量化矩阵上所有基因和每个模块的相似性。

代码

文本

[21]

weight = as.data.frame(datTraits$weight_g);

names(weight) = "weight";

modNames = substring(names(MEs), 3)

geneModuleMembership = as.data.frame(cor(datExpr, MEs, use = "p"));

MMPvalue = as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), nSamples));

names(geneModuleMembership) = paste("MM", modNames, sep="");

names(MMPvalue) = paste("p.MM", modNames, sep="");

geneTraitSignificance = as.data.frame(cor(datExpr, weight, use = "p"));#和体重性状的关联

GSPvalue = as.data.frame(corPvalueStudent(as.matrix(geneTraitSignificance), nSamples));

names(geneTraitSignificance) = paste("GS.", names(weight), sep="");

names(GSPvalue) = paste("p.GS.", names(weight), sep="");

代码

文本

3. 模块内分析：鉴定具有高GS和高MM的基因

GS和MM测量，可以识别与体重高度相关的基因，以及感兴趣的模块中的高度相关的成员。这个例子中，体重与棕色模块的关联度较高，因此在棕色模块中绘制基因显著性和模块成员关系的散点图。

GS：所有基因表达谱与这个模块的eigengene的相关性（cor）。每一个值代表这个基因与模块之间的关系。如果这个值的绝对值接近0，那么这个基因就不是这个模块中的一部分，如果这个值的绝对值接近1，那么这个基因就与这个模块高度相关。

MM：基因和表型性状比如体重之间的相关性的绝对值。为了将表型特征信息与共表达网络联合起来，比如体重与哪个模块高度相关。每一个基因的表达值与表型性状之间的相关性的绝对值。0表示这个基因与这个性状不相关，1表示高度相关。如果一个模块中的基因都有这个性状高度相关，那么这个模块也就与这个性状高度相关。

运行以下代码可视化GS和MM：

代码

文本

[22]

module = "brown"

column = match(module, modNames);

moduleGenes = moduleColors==module;

par(mfrow = c(1,1));

verboseScatterplot(abs(geneModuleMembership[moduleGenes, column]),

abs(geneTraitSignificance[moduleGenes, 1]),

xlab = paste("Module Membership in", module, "module"),

ylab = "Gene significance for body weight",

main = paste("Module membership vs. gene significance\n"),

cex.main = 1.2, cex.lab = 1.2, cex.axis = 1.2, col = module)

代码

文本

MM-GS图的每一个点代表一个基因。横坐标值表示基因与模块的相关性，纵坐标值表示基因与表型性状的相关性，这里可以看出与性状高度显著相关的基因往往是与这个性状显著相关的模块中的重要元素。

4. 输出网络分析结果

代码

文本

[23]

# 返回所有在分析中的基因ID

names(datExpr)

'MMT00000044'
'MMT00000046'
'MMT00000051'
'MMT00000076'
'MMT00000080'
'MMT00000102'
'MMT00000149'
'MMT00000159'
'MMT00000207'
'MMT00000212'
'MMT00000231'
'MMT00000241'
'MMT00000268'
'MMT00000283'
'MMT00000334'
'MMT00000365'
'MMT00000368'
'MMT00000373'
'MMT00000384'
'MMT00000401'
'MMT00000418'
'MMT00000464'
'MMT00000517'
'MMT00000525'
'MMT00000549'
'MMT00000550'
'MMT00000602'
'MMT00000608'
'MMT00000701'
'MMT00000713'
'MMT00000719'
'MMT00000743'
'MMT00000792'
'MMT00000793'
'MMT00000801'
'MMT00000840'
'MMT00000864'
'MMT00000887'
'MMT00000963'
'MMT00000988'
'MMT00000996'
'MMT00001022'
'MMT00001077'
'MMT00001085'
'MMT00001100'
'MMT00001110'
'MMT00001154'
'MMT00001185'
'MMT00001190'
'MMT00001245'
'MMT00001260'
'MMT00001291'
'MMT00001298'
'MMT00001318'
'MMT00001373'
'MMT00001387'
'MMT00001394'
'MMT00001397'
'MMT00001423'
'MMT00001434'
'MMT00001486'
'MMT00001496'
'MMT00001510'
'MMT00001545'
'MMT00001555'
'MMT00001587'
'MMT00001596'
'MMT00001613'
'MMT00001646'
'MMT00001675'
'MMT00001698'
'MMT00001714'
'MMT00001732'
'MMT00001791'
'MMT00001806'
'MMT00001923'
'MMT00001947'
'MMT00001949'
'MMT00001995'
'MMT00002002'
'MMT00002004'
'MMT00002021'
'MMT00002022'
'MMT00002037'
'MMT00002042'
'MMT00002046'
'MMT00002048'
'MMT00002050'
'MMT00002099'
'MMT00002102'
'MMT00002151'
'MMT00002160'
'MMT00002161'
'MMT00002175'
'MMT00002209'
'MMT00002227'
'MMT00002238'
'MMT00002272'
'MMT00002304'
'MMT00002330'
'MMT00002338'
'MMT00002391'
'MMT00002392'
'MMT00002494'
'MMT00002521'
'MMT00002529'
'MMT00002532'
'MMT00002546'
'MMT00002575'
'MMT00002592'
'MMT00002594'
'MMT00002597'
'MMT00002655'
'MMT00002755'
'MMT00002758'
'MMT00002824'
'MMT00002875'
'MMT00002932'
'MMT00002956'
'MMT00003016'
'MMT00003058'
'MMT00003069'
'MMT00003071'
'MMT00003081'
'MMT00003107'
'MMT00003127'
'MMT00003136'
'MMT00003188'
'MMT00003211'
'MMT00003214'
'MMT00003278'
'MMT00003342'
'MMT00003365'
'MMT00003391'
'MMT00003410'
'MMT00003424'
'MMT00003453'
'MMT00003456'
'MMT00003470'
'MMT00003498'
'MMT00003506'
'MMT00003530'
'MMT00003533'
'MMT00003545'
'MMT00003569'
'MMT00003575'
'MMT00003586'
'MMT00003596'
'MMT00003620'
'MMT00003651'
'MMT00003672'
'MMT00003724'
'MMT00003764'
'MMT00003905'
'MMT00003906'
'MMT00003908'
'MMT00003950'
'MMT00003968'
'MMT00003970'
'MMT00003975'
'MMT00003980'
'MMT00003982'
'MMT00003994'
'MMT00004034'
'MMT00004086'
'MMT00004126'
'MMT00004142'
'MMT00004170'
'MMT00004171'
'MMT00004172'
'MMT00004176'
'MMT00004227'
'MMT00004230'
'MMT00004254'
'MMT00004264'
'MMT00004276'
'MMT00004283'
'MMT00004326'
'MMT00004393'
'MMT00004394'
'MMT00004397'
'MMT00004398'
'MMT00004408'
'MMT00004428'
'MMT00004455'
'MMT00004520'
'MMT00004524'
'MMT00004529'
'MMT00004594'
'MMT00004605'
'MMT00004614'
'MMT00004625'
'MMT00004631'
'MMT00004639'
'MMT00004671'
'MMT00004682'
'MMT00004703'
'MMT00004721'
'MMT00004807'
'MMT00004841'
⋯
'MMT00078449'
'MMT00078455'
'MMT00078486'
'MMT00078506'
'MMT00078523'
'MMT00078527'
'MMT00078537'
'MMT00078543'
'MMT00078546'
'MMT00078551'
'MMT00078559'
'MMT00078566'
'MMT00078625'
'MMT00078657'
'MMT00078658'
'MMT00078676'
'MMT00078692'
'MMT00078698'
'MMT00078706'
'MMT00078723'
'MMT00078732'
'MMT00078811'
'MMT00078816'
'MMT00078831'
'MMT00078835'
'MMT00078837'
'MMT00078844'
'MMT00078851'
'MMT00078861'
'MMT00078909'
'MMT00078918'
'MMT00078919'
'MMT00078931'
'MMT00078940'
'MMT00078942'
'MMT00078950'
'MMT00078969'
'MMT00078976'
'MMT00079074'
'MMT00079130'
'MMT00079131'
'MMT00079144'
'MMT00079155'
'MMT00079156'
'MMT00079213'
'MMT00079275'
'MMT00079286'
'MMT00079290'
'MMT00079309'
'MMT00079316'
'MMT00079332'
'MMT00079343'
'MMT00079348'
'MMT00079364'
'MMT00079369'
'MMT00079385'
'MMT00079397'
'MMT00079426'
'MMT00079439'
'MMT00079517'
'MMT00079520'
'MMT00079550'
'MMT00079592'
'MMT00079610'
'MMT00079611'
'MMT00079617'
'MMT00079636'
'MMT00079659'
'MMT00079689'
'MMT00079723'
'MMT00079761'
'MMT00079786'
'MMT00079792'
'MMT00079850'
'MMT00079874'
'MMT00079876'
'MMT00079883'
'MMT00079885'
'MMT00079905'
'MMT00079956'
'MMT00080032'
'MMT00080077'
'MMT00080093'
'MMT00080097'
'MMT00080105'
'MMT00080150'
'MMT00080162'
'MMT00080165'
'MMT00080167'
'MMT00080321'
'MMT00080367'
'MMT00080406'
'MMT00080515'
'MMT00080518'
'MMT00080534'
'MMT00080541'
'MMT00080548'
'MMT00080563'
'MMT00080578'
'MMT00080620'
'MMT00080624'
'MMT00080630'
'MMT00080680'
'MMT00080684'
'MMT00080694'
'MMT00080695'
'MMT00080701'
'MMT00080717'
'MMT00080721'
'MMT00080768'
'MMT00080789'
'MMT00080792'
'MMT00080840'
'MMT00080864'
'MMT00080903'
'MMT00080943'
'MMT00080984'
'MMT00081002'
'MMT00081013'
'MMT00081019'
'MMT00081115'
'MMT00081122'
'MMT00081127'
'MMT00081133'
'MMT00081171'
'MMT00081203'
'MMT00081213'
'MMT00081218'
'MMT00081249'
'MMT00081261'
'MMT00081264'
'MMT00081290'
'MMT00081299'
'MMT00081300'
'MMT00081331'
'MMT00081348'
'MMT00081360'
'MMT00081375'
'MMT00081411'
'MMT00081414'
'MMT00081436'
'MMT00081439'
'MMT00081532'
'MMT00081543'
'MMT00081555'
'MMT00081571'
'MMT00081578'
'MMT00081596'
'MMT00081689'
'MMT00081718'
'MMT00081757'
'MMT00081768'
'MMT00081874'
'MMT00081880'
'MMT00081919'
'MMT00081967'
'MMT00081975'
'MMT00082034'
'MMT00082041'
'MMT00082073'
'MMT00082101'
'MMT00082110'
'MMT00082126'
'MMT00082164'
'MMT00082181'
'MMT00082243'
'MMT00082250'
'MMT00082255'
'MMT00082259'
'MMT00082303'
'MMT00082316'
'MMT00082420'
'MMT00082428'
'MMT00082445'
'MMT00082461'
'MMT00082551'
'MMT00082556'
'MMT00082577'
'MMT00082579'
'MMT00082585'
'MMT00082592'
'MMT00082622'
'MMT00082650'
'MMT00082651'
'MMT00082663'
'MMT00082677'
'MMT00082712'
'MMT00082753'
'MMT00082759'
'MMT00082798'
'MMT00082822'
'MMT00082828'
'MMT00082829'
'MMT00082832'
'MMT00082847'
'MMT00082850'
'MMT00082869'
'MMT00082877'
'MMT00082899'
'MMT00082906'

代码

文本

[24]

# 返回属于棕色模块的基因ID

names(datExpr)[moduleColors=="brown"]

'MMT00000887'
'MMT00001077'
'MMT00001185'
'MMT00001486'
'MMT00002002'
'MMT00002037'
'MMT00002102'
'MMT00002209'
'MMT00002575'
'MMT00002758'
'MMT00002824'
'MMT00003081'
'MMT00003586'
'MMT00003596'
'MMT00003970'
'MMT00003982'
'MMT00003994'
'MMT00004034'
'MMT00004170'
'MMT00004283'
'MMT00004397'
'MMT00004428'
'MMT00004844'
'MMT00006001'
'MMT00006077'
'MMT00006097'
'MMT00006230'
'MMT00006315'
'MMT00006378'
'MMT00006545'
'MMT00006709'
'MMT00006713'
'MMT00006822'
'MMT00006859'
'MMT00007042'
'MMT00007205'
'MMT00007277'
'MMT00007603'
'MMT00007709'
'MMT00007836'
'MMT00007847'
'MMT00007859'
'MMT00007963'
'MMT00007995'
'MMT00008094'
'MMT00008463'
'MMT00008968'
'MMT00008970'
'MMT00009272'
'MMT00009690'
'MMT00009857'
'MMT00009951'
'MMT00010412'
'MMT00010542'
'MMT00010602'
'MMT00010873'
'MMT00010907'
'MMT00011268'
'MMT00011876'
'MMT00012202'
'MMT00012203'
'MMT00012511'
'MMT00012992'
'MMT00013100'
'MMT00013122'
'MMT00013203'
'MMT00013227'
'MMT00013704'
'MMT00013759'
'MMT00014132'
'MMT00014558'
'MMT00014630'
'MMT00014730'
'MMT00015180'
'MMT00015289'
'MMT00015334'
'MMT00015563'
'MMT00015593'
'MMT00015674'
'MMT00016457'
'MMT00016835'
'MMT00016958'
'MMT00017188'
'MMT00017203'
'MMT00017421'
'MMT00017456'
'MMT00017674'
'MMT00017718'
'MMT00018071'
'MMT00018085'
'MMT00018374'
'MMT00018479'
'MMT00018643'
'MMT00018797'
'MMT00019063'
'MMT00019191'
'MMT00019257'
'MMT00019405'
'MMT00019744'
'MMT00020088'
'MMT00020374'
'MMT00020598'
'MMT00020770'
'MMT00020883'
'MMT00021004'
'MMT00021090'
'MMT00021275'
'MMT00021643'
'MMT00021734'
'MMT00021743'
'MMT00021805'
'MMT00022098'
'MMT00022230'
'MMT00022657'
'MMT00022754'
'MMT00022932'
'MMT00024107'
'MMT00024150'
'MMT00024300'
'MMT00024492'
'MMT00024851'
'MMT00025030'
'MMT00025048'
'MMT00025256'
'MMT00025527'
'MMT00025842'
'MMT00025886'
'MMT00026028'
'MMT00026117'
'MMT00026255'
'MMT00026611'
'MMT00026638'
'MMT00027064'
'MMT00027170'
'MMT00027378'
'MMT00027530'
'MMT00027663'
'MMT00027667'
'MMT00027763'
'MMT00027861'
'MMT00027989'
'MMT00028002'
'MMT00028568'
'MMT00028633'
'MMT00028763'
'MMT00028861'
'MMT00028979'
'MMT00029126'
'MMT00029192'
'MMT00029369'
'MMT00030150'
'MMT00030176'
'MMT00030229'
'MMT00030448'
'MMT00030465'
'MMT00030541'
'MMT00030781'
'MMT00030800'
'MMT00030931'
'MMT00031029'
'MMT00031086'
'MMT00031229'
'MMT00031263'
'MMT00031585'
'MMT00031586'
'MMT00031617'
'MMT00031650'
'MMT00031751'
'MMT00032175'
'MMT00032542'
'MMT00032545'
'MMT00032680'
'MMT00032840'
'MMT00032920'
'MMT00033105'
'MMT00033171'
'MMT00033222'
'MMT00033268'
'MMT00034286'
'MMT00034467'
'MMT00034709'
'MMT00034792'
'MMT00034839'
'MMT00034916'
'MMT00035158'
'MMT00035243'
'MMT00035724'
'MMT00035984'
'MMT00036340'
'MMT00036739'
'MMT00036954'
'MMT00037447'
'MMT00038270'
'MMT00038471'
'MMT00038915'
'MMT00038934'
'MMT00039183'
'MMT00039459'
'MMT00039764'
'MMT00039882'
⋯
'MMT00042929'
'MMT00042972'
'MMT00043411'
'MMT00043537'
'MMT00043939'
'MMT00043964'
'MMT00044287'
'MMT00044996'
'MMT00045252'
'MMT00045344'
'MMT00045751'
'MMT00046778'
'MMT00046836'
'MMT00047127'
'MMT00047197'
'MMT00047418'
'MMT00048209'
'MMT00048535'
'MMT00048720'
'MMT00049092'
'MMT00049111'
'MMT00049221'
'MMT00049383'
'MMT00049553'
'MMT00049556'
'MMT00049743'
'MMT00050031'
'MMT00050086'
'MMT00050363'
'MMT00050552'
'MMT00050576'
'MMT00051177'
'MMT00051278'
'MMT00051292'
'MMT00051303'
'MMT00051523'
'MMT00052337'
'MMT00052658'
'MMT00052695'
'MMT00052859'
'MMT00053210'
'MMT00053218'
'MMT00053489'
'MMT00053497'
'MMT00053545'
'MMT00053917'
'MMT00054261'
'MMT00054422'
'MMT00054464'
'MMT00054735'
'MMT00055005'
'MMT00055132'
'MMT00055391'
'MMT00055441'
'MMT00056362'
'MMT00056584'
'MMT00056716'
'MMT00056798'
'MMT00057508'
'MMT00058021'
'MMT00058158'
'MMT00058222'
'MMT00058752'
'MMT00059202'
'MMT00059241'
'MMT00059258'
'MMT00059782'
'MMT00060094'
'MMT00060423'
'MMT00060443'
'MMT00060559'
'MMT00060760'
'MMT00060952'
'MMT00061101'
'MMT00061203'
'MMT00061256'
'MMT00061313'
'MMT00061484'
'MMT00061509'
'MMT00061586'
'MMT00061735'
'MMT00061739'
'MMT00061815'
'MMT00061857'
'MMT00061884'
'MMT00061892'
'MMT00061998'
'MMT00062460'
'MMT00062787'
'MMT00062990'
'MMT00063198'
'MMT00063359'
'MMT00063470'
'MMT00063623'
'MMT00064235'
'MMT00064433'
'MMT00064617'
'MMT00064719'
'MMT00064851'
'MMT00064897'
'MMT00065001'
'MMT00065115'
'MMT00065116'
'MMT00065159'
'MMT00065770'
'MMT00066884'
'MMT00067008'
'MMT00067079'
'MMT00067105'
'MMT00067156'
'MMT00067261'
'MMT00067296'
'MMT00067525'
'MMT00067823'
'MMT00068494'
'MMT00068509'
'MMT00068530'
'MMT00068861'
'MMT00069165'
'MMT00069425'
'MMT00069884'
'MMT00070201'
'MMT00070277'
'MMT00070342'
'MMT00070429'
'MMT00070618'
'MMT00070677'
'MMT00070750'
'MMT00071052'
'MMT00071242'
'MMT00071411'
'MMT00071664'
'MMT00071772'
'MMT00071856'
'MMT00071857'
'MMT00071976'
'MMT00072042'
'MMT00072057'
'MMT00072237'
'MMT00072411'
'MMT00072657'
'MMT00073157'
'MMT00073211'
'MMT00073308'
'MMT00073344'
'MMT00073365'
'MMT00073735'
'MMT00073829'
'MMT00074488'
'MMT00074499'
'MMT00074523'
'MMT00074527'
'MMT00074580'
'MMT00074886'
'MMT00074990'
'MMT00075171'
'MMT00075402'
'MMT00075556'
'MMT00075754'
'MMT00076056'
'MMT00076233'
'MMT00076371'
'MMT00076382'
'MMT00076602'
'MMT00076754'
'MMT00076864'
'MMT00077152'
'MMT00077244'
'MMT00077345'
'MMT00077649'
'MMT00078015'
'MMT00078110'
'MMT00078258'
'MMT00078486'
'MMT00078698'
'MMT00078723'
'MMT00078816'
'MMT00078851'
'MMT00078976'
'MMT00079074'
'MMT00079144'
'MMT00079155'
'MMT00079156'
'MMT00079213'
'MMT00079316'
'MMT00079723'
'MMT00079874'
'MMT00080695'
'MMT00081019'
'MMT00081127'
'MMT00081171'
'MMT00081331'
'MMT00081375'
'MMT00081436'
'MMT00081571'
'MMT00081975'
'MMT00082041'
'MMT00082551'
'MMT00082712'
'MMT00082759'

代码

文本

[25]

annot = read.csv(file = "GeneAnnotation.csv");

dim(annot)

names(annot)

probes = names(datExpr) # 匹配信息

probes2annot = match(probes, annot$substanceBXH);

sum(is.na(probes2annot)) # 检测是否有没有匹配上的ID号，正常来说为0，即全匹配上了。

23388
34

'X'
'ID'
'arrayname'
'substanceBXH'
'gene_symbol'
'LocusLinkID'
'OfficialGeneSymbol'
'OfficialGeneName'
'LocusLinkSymbol'
'LocusLinkName'
'ProteomeShortDescription'
'UnigeneCluster'
'LocusLinkCode'
'ProteomeID'
'ProteomeCode'
'SwissprotID'
'OMIMCode'
'DirectedTilingPriority'
'AlternateSymbols'
'AlternateNames'
'SpeciesID'
'cytogeneticLoc'
'Organism'
'clustername'
'reporterid'
'probeid'
'sequenceid'
'clusterid'
'chromosome'
'startcoordinate'
'endcoordinate'
'strand'
'sequence_3_to_5_prime'
'sequence_5_to_3_prime'

代码

文本

[26]

# 输出必要的信息：

geneInfo0 = data.frame(substanceBXH = probes,

geneSymbol = annot$gene_symbol[probes2annot],

LocusLinkID = annot$LocusLinkID[probes2annot],

moduleColor = moduleColors,

geneTraitSignificance,

GSPvalue);

# 按照与体重的显著水平将模块进行排序:

modOrder = order(-abs(cor(MEs, weight, use = "p")));

# 添加模块成员的信息：

for (mod in 1:ncol(geneModuleMembership))

{

oldNames = names(geneInfo0)

geneInfo0 = data.frame(geneInfo0, geneModuleMembership[, modOrder[mod]],

MMPvalue[, modOrder[mod]]);

names(geneInfo0) = c(oldNames, paste("MM.", modNames[modOrder[mod]], sep=""),

paste("p.MM.", modNames[modOrder[mod]], sep=""))

}

geneOrder = order(geneInfo0$moduleColor, -abs(geneInfo0$GS.weight)); # 排序

geneInfo = geneInfo0[geneOrder, ]

# 输出为CSV格式，可用fix(geneInfo)在R中查看：

write.csv(geneInfo, file = "geneInfo.csv")

代码

文本

四、网络交互分析（GO注释等）

将重要的基因进行功能注释

1. 输出基因列表供在线软件服务使用

导出基因识标符列表，该列表可以在几个常用的基因本体David和功能富集分析AmiGo中输入使用。例如，将brown棕色模块的LocusLinkID（entrez）标识符代码写到一个文件中：

代码

文本

[27]

annot = read.csv(file = "/bohr/wgcna-ss71/v1/GeneAnnotation.csv")

probes = names(datExpr)

probes2annot = match(probes, annot$substanceBXH)

allLLIDs = annot$LocusLinkID[probes2annot]

intModules = c("brown", "red", "salmon")

for (module in intModules)

{

# Select module probes

modGenes = (moduleColors==module);

# Get their entrez ID codes

modLLIDs = allLLIDs[modGenes];

# Write them into a file

fileName = paste("LocusLinkIDs-", module, ".txt", sep="");

write.table(as.data.frame(modLLIDs), file = fileName,

row.names = FALSE, col.names = FALSE)

}

代码

文本

2. 直接用 R 进行 GO 富集分析

代码

文本

[28]

GOenr = GOenrichmentAnalysis(moduleColors, allLLIDs, organism = "mouse", nBestP = 10) # 这个例子中研究的是小鼠的基因表达

tab = GOenr$bestPTerms[[4]]$enrichment

Warning message in GOenrichmentAnalysis(moduleColors, allLLIDs, organism = "mouse", :
“This function is deprecated and will be removed in the near future. 
We suggest using the replacement function enrichmentAnalysis 
in R package anRichment, available from the following URL:
https://labs.genetics.ucla.edu/horvath/htdocs/CoexpressionNetwork/GeneAnnotation/”
Loading required package: org.Mm.eg.db

Loading required package: AnnotationDbi

Loading required package: stats4

Loading required package: BiocGenerics


Attaching package: ‘BiocGenerics’


The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs


The following objects are masked from ‘package:base’:

    Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
    as.data.frame, basename, cbind, colnames, dirname, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
    pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
    tapply, union, unique, unsplit, which.max, which.min


Loading required package: Biobase

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.


Loading required package: IRanges

Loading required package: S4Vectors


Attaching package: ‘S4Vectors’


The following object is masked from ‘package:utils’:

    findMatches


The following objects are masked from ‘package:base’:

    I, expand.grid, unname




Loading required package: GO.db

 GOenrichmentAnalysis: loading annotation data...
  ..of the 3038  Entrez identifiers submitted, 2843 are mapped in current GO categories.
  ..will use 2843 background genes for enrichment calculations.
  ..preparing term lists (this may take a while).. 
  ..working on label set 1 ..
    ..calculating enrichments (this may also take a while)..
    ..putting together terms with highest enrichment significance..

代码

文本

以上结果是一个富集列表，包含每个模块颜色中10个最佳的条目。可以通过以下方式访问表中列的名称：

代码

文本

[29]

names(tab)

write.table(tab, file = "GOEnrichmentTable.csv", sep = ",", quote = TRUE, row.names = FALSE)

'module'
'modSize'
'bkgrModSize'
'rank'
'enrichmentP'
'BonferoniP'
'nModGenesInTerm'
'fracOfBkgrModSize'
'fracOfBkgrTermSize'
'bkgrTermSize'
'termID'
'termOntology'
'termName'
'termDefinition'

代码

文本

也可以直接删减一些内容，使其在 R 中快速显示出来：

代码

文本

[30]

keepCols = c(1, 2, 5, 6, 7, 12, 13);

screenTab = tab[, keepCols];

numCols = c(3, 4);

screenTab[, numCols] = signif(apply(screenTab[, numCols], 2, as.numeric), 2) #将数字保留两位小数

# 将术语名称截断为最多 40 个字符:

screenTab[, 7] = substring(screenTab[, 7], 1, 40)

colnames(screenTab) = c("module", "size", "p-val", "Bonf", "nInTerm", "ont", "term name");

rownames(screenTab) = NULL;

options(width=95)

screenTab

A data.frame: 190 × 7
module	size	p-val	Bonf	nInTerm	ont	term name
<chr>	<int>	<dbl>	<dbl>	<dbl>	<chr>	<chr>
black	166	7.0e-05	1.0e+00	15	MF	receptor ligand activity
black	166	7.9e-05	1.0e+00	15	MF	signaling receptor activator activity
black	166	1.8e-04	1.0e+00	15	MF	signaling receptor regulator activity
black	166	6.6e-04	1.0e+00	5	BP	mRNA transport
black	166	6.9e-04	1.0e+00	4	BP	dopamine transport
black	166	2.0e-03	1.0e+00	6	BP	rRNA metabolic process
black	166	2.2e-03	1.0e+00	5	BP	RNA transport
black	166	2.4e-03	1.0e+00	15	BP	G protein-coupled receptor signaling pat
black	166	2.6e-03	1.0e+00	2	BP	ventricular compact myocardium morphogen
black	166	2.6e-03	1.0e+00	2	BP	synaptic vesicle fusion to presynaptic a
blue	428	5.1e-44	9.8e-40	193	BP	immune system process
blue	428	2.6e-42	4.9e-38	150	BP	immune response
blue	428	9.1e-35	1.7e-30	115	BP	defense response to other organism
blue	428	3.8e-34	7.2e-30	144	BP	defense response
blue	428	8.4e-34	1.6e-29	132	BP	regulation of immune system process
blue	428	2.9e-32	5.6e-28	131	BP	response to external biotic stimulus
blue	428	2.9e-32	5.6e-28	131	BP	response to other organism
blue	428	1.9e-31	3.6e-27	106	BP	positive regulation of immune system pro
blue	428	1.0e-29	2.0e-25	93	BP	innate immune response
blue	428	1.7e-29	3.2e-25	132	BP	biological process involved in interspec
brown	396	2.4e-23	4.6e-19	51	CC	extracellular matrix
brown	396	2.1e-18	4.0e-14	37	CC	collagen-containing extracellular matrix
brown	396	8.1e-18	1.6e-13	30	MF	extracellular matrix structural constitu
brown	396	8.5e-18	1.6e-13	124	CC	extracellular region
brown	396	3.3e-14	6.3e-10	36	BP	extracellular matrix organization
brown	396	6.1e-13	1.2e-08	62	BP	vasculature development
brown	396	1.2e-12	2.2e-08	60	BP	blood vessel development
brown	396	1.8e-12	3.4e-08	95	CC	extracellular space
brown	396	1.9e-12	3.7e-08	196	CC	cell periphery
brown	396	2.5e-11	4.9e-07	52	BP	blood vessel morphogenesis
⋮	⋮	⋮	⋮	⋮	⋮	⋮
tan	81	6.9e-05	1.00	3	BP	platelet dense granule organization
tan	81	1.7e-04	1.00	3	BP	vesicle cargo loading
tan	81	3.8e-04	1.00	38	MF	catalytic activity
tan	81	6.9e-04	1.00	2	BP	clathrin-coated vesicle cargo loading, A
tan	81	6.9e-04	1.00	2	CC	AP-3 adaptor complex
tan	81	7.7e-04	1.00	6	BP	hormone metabolic process
tan	81	9.0e-04	1.00	3	BP	secretory granule organization
tan	81	1.7e-03	1.00	32	CC	endomembrane system
tan	81	1.8e-03	1.00	20	BP	small molecule metabolic process
tan	81	1.9e-03	1.00	3	BP	canonical glycolysis
turquoise	529	2.8e-05	0.54	12	BP	translational initiation
turquoise	529	5.3e-05	1.00	36	CC	intracellular protein-containing complex
turquoise	529	9.2e-05	1.00	39	MF	transcription factor binding
turquoise	529	1.2e-04	1.00	16	BP	sensory perception of chemical stimulus
turquoise	529	1.5e-04	1.00	14	BP	sensory perception of smell
turquoise	529	1.8e-04	1.00	26	MF	transcription coregulator activity
turquoise	529	2.9e-04	1.00	378	CC	intracellular organelle
turquoise	529	4.9e-04	1.00	73	BP	positive regulation of macromolecule bio
turquoise	529	5.7e-04	1.00	124	CC	nucleoplasm
turquoise	529	5.9e-04	1.00	136	CC	nuclear lumen
yellow	199	1.2e-04	1.00	3	MF	nickel cation binding
yellow	199	1.8e-04	1.00	5	CC	endocytic vesicle membrane
yellow	199	1.9e-04	1.00	4	BP	xenobiotic catabolic process
yellow	199	3.7e-04	1.00	4	BP	regulation of animal organ formation
yellow	199	8.2e-04	1.00	7	BP	glycolytic process
yellow	199	8.2e-04	1.00	7	BP	ATP generation from ADP
yellow	199	8.4e-04	1.00	5	BP	vesicle budding from membrane
yellow	199	1.0e-03	1.00	4	BP	benzene-containing compound metabolic pr
yellow	199	1.1e-03	1.00	14	BP	purine nucleotide metabolic process
yellow	199	1.2e-03	1.00	7	BP	nucleoside diphosphate phosphorylation

代码

文本

五、网络可视化

代码

文本

[31]

nGenes = ncol(datExpr)

nSamples = nrow(datExpr)

代码

文本

1. 可视化基因网络

计算 TOM 矩阵：

代码

文本

[32]

dissTOM = 1-TOMsimilarityFromExpr(datExpr, power = 6);

plotTOM = dissTOM^7;

diag(plotTOM) = NA;

sizeGrWindow(9,9)

TOMplot(plotTOM, geneTree, moduleColors, main = "Network heatmap plot, all genes")

TOM calculation: adjacency..
..will not use multithreading.
 Fraction of slow calculations: 0.396405
..connectivity..
..matrix multiplication (system BLAS)..
..normalization..
..done.

代码

文本

可视化加权网络的方法之一是制作热图。热图的每行每列代表一个基因，浅色代表低邻接（重叠）；深色代表高邻接(重叠)。

生成的热图可能需要大量的时间。可以限制基因的数量来加快绘图。但是基因子集的树状图看起来与所有基因的树状图不同，下面随机选取400个基因进行绘图：

代码

文本

[33]

nSelect = 400

set.seed(10);

select = sample(nGenes, size = nSelect);

selectTOM = dissTOM[select, select];

selectTree = hclust(as.dist(selectTOM), method = "average")

selectColors = moduleColors[select];

plotDiss = selectTOM^7;

diag(plotDiss) = NA;

TOMplot(plotDiss, selectTree, selectColors, main = "Network heatmap plot, selected genes")

#改变热图的深色背景为白色背景：

library(gplots)

myheatcol = colorpanel(250,'red',"orange",'lemonchiffon')

TOMplot(plotDiss, selectTree, selectColors, main = "Network heatmap plot, selected genes", col=myheatcol)

代码

文本

2. 可视化表征基因网络

研究找到的模块之间的关系，可以使用 eigengene 表征基因作为代表轮廓，通过特征基因相关性来量化模块的相似性。该包包含的函数plotEigengeneNetworks，可以生成 eigengene 网络的摘要图。

代码

文本

[34]

# 重新计算模块的eigengenes

MEs = moduleEigengenes(datExpr, moduleColors)$eigengenes

# 提取体重的表型数据

weight = as.data.frame(datTraits$weight_g);

names(weight) = "weight"

# 加入到相应的模块

MET = orderMEs(cbind(MEs, weight))

#画图：特征模块与体重数据的聚类图和热图

par(cex = 0.9)

plotEigengeneNetworks(MET, "", marDendro = c(0,4,1,2), marHeatmap = c(3,4,1,2), cex.lab = 0.8, xLabelsAngle = 90)

代码

文本

[35]

# 拆分聚类图和热图：

par(cex = 1.0)

plotEigengeneNetworks(MET, "Eigengene dendrogram", marDendro = c(0,4,2,0),

plotHeatmaps = FALSE)

par(cex = 1.0)

plotEigengeneNetworks(MET, "Eigengene adjacency heatmap", marHeatmap = c(3,4,2,2),

plotDendrograms = FALSE, xLabelsAngle = 90)

代码

文本

从图中结果可知，体重与模块 MEbrown、MEred、MEblue的关系更密切。

六、将网络导出到网络可视化软件

第六步是我们最想要的结果，也是每篇文献中最主要的一个图，就是hub gene 的互作关系网络图。这步会告诉你如何将必要的数据导出，以供其他软件进行绘图，例如VisANT、Cytoscape。

1. 输出到 VisANT 软件所需的数据

代码

文本

[36]

TOM = TOMsimilarityFromExpr(datExpr, power = 6);

module = "brown";

probes = names(datExpr)

inModule = (moduleColors==module);

modProbes = probes[inModule];

modTOM = TOM[inModule, inModule];

dimnames(modTOM) = list(modProbes, modProbes)

vis = exportNetworkToVisANT(modTOM,

file = paste("VisANTInput-", module, ".txt", sep=""),

weighted = TRUE,

threshold = 0,

probeToGene = data.frame(annot$substanceBXH, annot$gene_symbol) )

TOM calculation: adjacency..
..will not use multithreading.
 Fraction of slow calculations: 0.396405
..connectivity..
..matrix multiplication (system BLAS)..
..normalization..
..done.

代码

文本

[37]

# 可以严格控制输出的 hub gene 的个数为 30 个以内在这个模块中：

nTop = 30;nTop = 30;

IMConn = softConnectivity(datExpr[, modProbes]);

top = (rank(-IMConn) <= nTop)

vis = exportNetworkToVisANT(modTOM[top, top],

file = paste("VisANTInput-", module, "-top30.txt", sep=""),

weighted = TRUE,

threshold = 0,

probeToGene = data.frame(annot$substanceBXH, annot$gene_symbol) )

 softConnectivity: FYI: connecitivty of genes with less than 45 valid samples will be returned as NA.
 ..calculating connectivities..

代码

文本

以上导出的数据可以用VisANT进行编辑，绘制互作网络。

2. 输出到 Cytoscape

Cytoscape 允许用户输入边缘文件和节点文件，允许用户指定例如连接权重和节点颜色。在这里，向 Cytoscape 展示了两个模块（红色和棕色模块）的输出。

代码

文本

[ ]

TOM = TOMsimilarityFromExpr(datExpr, power = 6);

annot = read.csv(file = "GeneAnnotation.csv");

# 选择棕色和红色的模块

modules = c("brown", "red");

probes = names(datExpr)

inModule = is.finite(match(moduleColors, modules));

modProbes = probes[inModule];

modGenes = annot$gene_symbol[match(modProbes, annot$substanceBXH)];

# 选择相关的 TOM矩阵

modTOM = TOM[inModule, inModule];

dimnames(modTOM) = list(modProbes, modProbes)

# Export the network into edge and node list files Cytoscape can read

cyt = exportNetworkToCytoscape(modTOM,

edgeFile = paste("CytoscapeInput-edges-", paste(modules, collapse="-"), ".txt", sep=""),

nodeFile = paste("CytoscapeInput-nodes-", paste(modules, collapse="-"), ".txt", sep=""),

weighted = TRUE,

threshold = 0.02,

nodeNames = modProbes,

altNodeNames = modGenes,

nodeAttr = moduleColors[inModule])

代码

文本

以上导出的数据可以用 Cytoscape 进行编辑，绘制互作网络。

代码

文本

参考资料

代码

文本

生物信息学

R包

基因共表达分析

核心基因

生物信息学R包基因共表达分析核心基因

点个赞吧