Bohrium
robot
新建

空间站广场

论文
Notebooks
比赛
课程
Apps
我的主页
我的Notebooks
我的论文库
我的足迹

我的工作空间

任务
节点
文件
数据集
镜像
项目
数据库
公开
velocyto:揭示 scRNA-seq 数据的 RNA velocity
生物信息学
RNA velocity
scRNA-seq
python包
R包
命令行
生物信息学RNA velocityscRNA-seqpython包R包命令行
孙楠
发布于 2023-11-22
推荐镜像 :Basic Image:bohrium-notebook:2023-03-26
推荐机型 :c4_m16_cpu
赞 2
velocyto:揭示 scRNA-seq 数据的 RNA velocity
一、安装 velocyto
二、生成分析所需要的.loom格式的文件
三、估计 RNA velocity
1. 加载.loom文件
2. 数据质控
3. Gamma拟合
4. Gamma 拟合和extrapolation
5. 投影
参考资料:

velocyto:揭示 scRNA-seq 数据的 RNA velocity

代码
文本

Open In Bohrium

Velocyto 是一个用于分析 scRNA-seq 数据中表达动态的软件包。特别是,它可以通过区分标准单细胞 RNA 测序方案中未剪接和剪接的 mRNA 来估计单细胞的 RNA 速度。

原始论文:La Manno, Gioele, et al. RNA velocity of single cells. Nature 560.7719 (2018): 494-498. https://doi.org/10.1038/s41586-018-0414-6.

转录动力学模型:

截屏2023-11-22 下午4.14.45.png

RNA velocity 理论描述:

截屏2023-11-22 下午4.11.05.png

截屏2023-11-22 下午4.11.26.png

代码
文本

📖 上手指南
本文档可在 Bohrium Notebook 上直接运行。你可以点击界面上方按钮 开始连接,选择 `bohrium-notebook:2023-03-26` 镜像和 `c4_m16_cpu` 节点配置,稍等片刻选择 `R kernel` 即可运行。

代码
文本

一、安装 velocyto

代码
文本
[1]
!pip install velocyto
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting velocyto
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/81/66/e8fff9d3b824fd99c0f678c47c740fec058ce2d9a0cfdf11b114ea8889f2/velocyto-0.17.17.tar.gz (198 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 198.9/198.9 kB 1.4 MB/s eta 0:00:00a 0:00:01
  Preparing metadata (setup.py) ... done
Requirement already satisfied: numpy in /opt/conda/lib/python3.8/site-packages (from velocyto) (1.23.5)
Requirement already satisfied: scipy in /opt/conda/lib/python3.8/site-packages (from velocyto) (1.7.3)
Requirement already satisfied: cython in /opt/conda/lib/python3.8/site-packages (from velocyto) (0.29.33)
Requirement already satisfied: numba in /opt/conda/lib/python3.8/site-packages (from velocyto) (0.56.4)
Requirement already satisfied: matplotlib in /opt/conda/lib/python3.8/site-packages (from velocyto) (3.7.1)
Requirement already satisfied: scikit-learn in /opt/conda/lib/python3.8/site-packages (from velocyto) (1.0.2)
Requirement already satisfied: h5py in /opt/conda/lib/python3.8/site-packages (from velocyto) (3.1.0)
Collecting loompy
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/f0/e3/8dc87471b34bc0db4e72f51a7aa0b454b3b9d551e15900862c022050aca3/loompy-3.0.7.tar.gz (4.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.8/4.8 MB 20.2 MB/s eta 0:00:00a 0:00:01
  Preparing metadata (setup.py) ... done
Collecting pysam
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/32/d1/a2d1cebe6c4f3acaf973d1fedcf7bf29209a4a4e8d5a9f2b9c5b20a2bcad/pysam-0.22.0-cp38-cp38-manylinux_2_28_x86_64.whl (24.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.4/24.4 MB 21.4 MB/s eta 0:00:0000:0100:01
Requirement already satisfied: Click in /opt/conda/lib/python3.8/site-packages (from velocyto) (7.1.2)
Requirement already satisfied: pandas in /opt/conda/lib/python3.8/site-packages (from velocyto) (1.5.3)
Requirement already satisfied: setuptools in /opt/conda/lib/python3.8/site-packages (from loompy->velocyto) (65.6.3)
Collecting numpy-groupies
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/17/e0/1b8ed88ed696cf5710fc6d28f33a651fad9bdf183c3012eedd757e461be4/numpy_groupies-0.9.22.tar.gz (53 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.3/53.3 kB 15.3 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (0.11.0)
Requirement already satisfied: importlib-resources>=3.2.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (5.2.0)
Requirement already satisfied: pillow>=6.2.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (9.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (3.0.9)
Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (1.0.5)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (1.4.4)
Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (2.8.2)
Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (4.38.0)
Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (23.0)
Requirement already satisfied: importlib-metadata in /opt/conda/lib/python3.8/site-packages (from numba->velocyto) (6.0.0)
Requirement already satisfied: llvmlite<0.40,>=0.39.0dev0 in /opt/conda/lib/python3.8/site-packages (from numba->velocyto) (0.39.1)
Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.8/site-packages (from pandas->velocyto) (2022.7)
Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.8/site-packages (from scikit-learn->velocyto) (3.1.0)
Requirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.8/site-packages (from scikit-learn->velocyto) (1.2.0)
Collecting numpy
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/2f/14/abc14a3f3663739e5d3c8fd980201d10788d75fea5b0685734227052c4f0/numpy-1.22.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.9/16.9 MB 28.7 MB/s eta 0:00:0000:0100:01
Requirement already satisfied: zipp>=3.1.0 in /opt/conda/lib/python3.8/site-packages (from importlib-resources>=3.2.0->matplotlib->velocyto) (3.14.0)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib->velocyto) (1.16.0)
Building wheels for collected packages: velocyto, loompy, numpy-groupies
  Building wheel for velocyto (setup.py) ... done
  Created wheel for velocyto: filename=velocyto-0.17.17-cp38-cp38-linux_x86_64.whl size=523758 sha256=e636d559adc3ed168d936689b88531732dae8e44a2efc7a3a7242c50d5b74a87
  Stored in directory: /root/.cache/pip/wheels/c1/3c/6e/68cd9bcfab44bc727185de0d03b2dfba2eef788e32d3b8d30f
  Building wheel for loompy (setup.py) ... done
  Created wheel for loompy: filename=loompy-3.0.7-py3-none-any.whl size=52018 sha256=7ec9292930abf525f221dd319d8d2b5306baa415b3aaca76d8f4b4530bbde3c9
  Stored in directory: /root/.cache/pip/wheels/bc/e7/b6/94504a02721c11aabf8124040e98ccdf78dee462ad3e7c78d2
  Building wheel for numpy-groupies (setup.py) ... done
  Created wheel for numpy-groupies: filename=numpy_groupies-0.9.22-py3-none-any.whl size=25846 sha256=b63872d7266b3cd9781e7796da5149f9e071fbd770943fc8cf328e9a65ff163f
  Stored in directory: /root/.cache/pip/wheels/88/a5/bc/0c26522b97154b79bcce8dfd11ebb1e043a3a1ecfeb3a0ed69
Successfully built velocyto loompy numpy-groupies
Installing collected packages: pysam, numpy, numpy-groupies, loompy, velocyto
  Attempting uninstall: numpy
    Found existing installation: numpy 1.23.5
    Uninstalling numpy-1.23.5:
      Successfully uninstalled numpy-1.23.5
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
moviepy 0.2.3.5 requires decorator<5.0,>=4.0.2, but you have decorator 5.1.1 which is incompatible.
cvxpy 1.2.3 requires setuptools<=64.0.2, but you have setuptools 65.6.3 which is incompatible.
Successfully installed loompy-3.0.7 numpy-1.22.4 numpy-groupies-0.9.22 pysam-0.22.0 velocyto-0.17.17
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
代码
文本

二、生成分析所需要的.loom格式的文件

代码
文本

velocyto 支持命令行方式从.bam/.sam文件生成 spliced/unspliced 计数矩阵的.loom文件,具体使用方式见:http://velocyto.org/velocyto.py/tutorial/cli.html#introduction。

本文档用现成的.loom格式文件:

代码
文本
[2]
from urllib.request import urlretrieve
urlretrieve("http://pklab.med.harvard.edu/velocyto/hgForebrainGlut/hgForebrainGlut.loom", "hgForebrainGlut.loom")
('hgForebrainGlut.loom', <http.client.HTTPMessage at 0x7fcbcb1cd490>)
代码
文本

Dentate Gyrus(颗粒回)是大脑海马回中的一个结构,是海马的一部分。海马是位于大脑内部、与记忆和空间导航相关的重要区域。颗粒回在海马回中位于海马回旁支,并涉及到神经元的生成和神经元连接的重要过程。对Dentate Gyrus的研究有助于理解大脑的结构和功能,以及与学习和记忆相关的神经生物学机制。

代码
文本

三、估计 RNA velocity

代码
文本
[3]
import velocyto as vcy

import warnings
warnings.filterwarnings("ignore")
代码
文本

1. 加载.loom文件

代码
文本
[4]
vlm = vcy.VelocytoLoom("hgForebrainGlut.loom")
代码
文本
[5]
# 查看 vlm 对象的所有属性和方法
dir(vlm)
['A',
 'S',
 'U',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_normalize_S',
 '_normalize_Sx',
 '_normalize_U',
 '_normalize_Ux',
 '_perform_PCA_imputed',
 '_plot_pca_imputed',
 '_plot_phase_portrait',
 'adjust_totS_totU',
 'ca',
 'calculate_embedding_shift',
 'calculate_grid_arrows',
 'calculate_shift',
 'calculate_velocity',
 'cluster_ix',
 'cluster_uid',
 'custom_filter_attributes',
 'default_filter_and_norm',
 'default_fit_preparation',
 'estimate_transition_prob',
 'extrapolate_cell_at_t',
 'filter_cells',
 'filter_genes',
 'filter_genes_by_phase_portrait',
 'filter_genes_good_fit',
 'fit_gammas',
 'gene_knn_imputation',
 'initial_Ucell_size',
 'initial_cell_size',
 'knn_imputation',
 'knn_imputation_precomputed',
 'loom_filepath',
 'normalize',
 'normalize_by_size_factor',
 'normalize_by_total',
 'normalize_median',
 'perform_PCA',
 'perform_TSNE',
 'plot_arrows_embedding',
 'plot_cell_transitions',
 'plot_expression_as_color',
 'plot_fractions',
 'plot_grid_arrows',
 'plot_pca',
 'plot_phase_portraits',
 'plot_velocity_as_color',
 'predict_U',
 'prepare_markov',
 'ra',
 'reload_raw',
 'robust_size_factor',
 'run_markov',
 'score_cluster_expression',
 'score_cv_vs_mean',
 'score_detection_levels',
 'set_clusters',
 'to_hdf5']
代码
文本
[6]
print('数据矩阵的形状:\n', vlm.S.shape, '\n')
print('loom文件中的基因信息:\n', vlm.ra,'\n')
print('loom文件中的细胞信息:\n', vlm.ca)
数据矩阵的形状:
 (32738, 1720) 

loom文件中的基因信息:
 {'Accession': array(['ENSG00000237613', 'ENSG00000238009', 'ENSG00000239945', ...,
       'ENSG00000240450', 'ENSG00000172288', 'ENSG00000231141'],
      dtype=object), 'Chromosome': array(['1', '1', '1', ..., 'Y', 'Y', 'Y'], dtype=object), 'End': array([   36081,   133566,    91105, ..., 27632852, 27771049, 27879535]), 'Gene': array(['FAM138A', 'RP11-34P13.7', 'RP11-34P13.8', ..., 'CSPG4P1Y', 'CDY1',
       'TTTY3'], dtype=object), 'Start': array([   34554,    89295,    89551, ..., 27629055, 27768264, 27874637]), 'Strand': array(['-', '-', '-', ..., '+', '+', '+'], dtype=object)} 

loom文件中的细胞信息:
 {'CellID': array(['10X_17_028:AACCATGGTAATCACCx', '10X_17_028:AACCATGCATACTACGx',
       '10X_17_028:AAACCTGGTAAAGGAGx', ...,
       '10X_17_029:TTTGGTTGTACCCAATx', '10X_17_029:TTTCCTCCAGTCCTTCx',
       '10X_17_029:TTTGCGCCACAGATTCx'], dtype=object), 'Clusters': array([3, 3, 1, ..., 6, 0, 1])}
代码
文本
[7]
# mormalization
vlm.normalize("S", size=True, log=True)
vlm.S_norm # contains log normalized
array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])
代码
文本
[8]
# spliced/unspliced fractions
vlm.plot_fractions()
代码
文本

2. 数据质控

代码
文本
[9]
# remove the cells with extremely low unspliced detection
import numpy as np
vlm.filter_cells(bool_array=vlm.initial_Ucell_size > np.percentile(vlm.initial_Ucell_size, 0.5))
代码
文本
[10]
# select relevant features for the downstream analysis
# make velocyto aware of the clusters annotation
vlm.set_clusters(vlm.ca["Clusters"])
代码
文本
[11]
# using the clustering annotation select the genes that are expressed above a threshold of total number of molecules in any of the clusters
vlm.score_detection_levels(min_expr_counts=40, min_cells_express=30)
vlm.filter_genes(by_detection_levels=True)
代码
文本
[12]
# perform feature selection
vlm.score_cv_vs_mean(3000, plot=True, max_expr_avg=35)
vlm.filter_genes(by_cv_vs_mean=True)
代码
文本
[13]
# normalize our data by size (total molecule count)
vlm._normalize_S(relative_size=vlm.S.sum(0),
target_size=vlm.S.sum(0).mean())
vlm._normalize_U(relative_size=vlm.U.sum(0),
target_size=vlm.U.sum(0).mean())
代码
文本

3. Gamma拟合

For the preparation of the gamma fit we smooth the data using a kNN neighbors pooling approach. kNN neighbors can be calculated directly in gene expression space or reduced PCA space, using either correlation distance or euclidean distance. One example of set of parameters is provided below.

代码
文本
[14]
len(vlm.ca['CellID'])
1711
代码
文本
[15]
vlm.perform_PCA()
vlm.knn_imputation(n_pca_dims=20, k=500, balanced=True, b_sight=1000, b_maxl=1500, n_jobs=2)
代码
文本

4. Gamma 拟合和extrapolation

代码
文本
[16]
# fit gamma to every gene
vlm.fit_gammas()
代码
文本
[17]
# visualize
vlm.plot_phase_portraits(['HES4', 'HES5'])
代码
文本

The calculate velocity and extrapolate the future state of the cells:

代码
文本
[18]
# The calculate velocity and extrapolate the future state of the cells:
vlm.predict_U()
vlm.calculate_velocity()
vlm.calculate_shift(assumption="constant_velocity")
vlm.extrapolate_cell_at_t(delta_t=1.)
代码
文本
[19]
# In alternative extrapolation can be performed using the constant unspliced assumption (for more information consult our preprint)
vlm.calculate_shift(assumption="constant_unspliced", delta_t=10)
vlm.extrapolate_cell_at_t(delta_t=1.)
代码
文本

5. 投影

代码
文本
[20]
from sklearn.manifold import TSNE
bh_tsne = TSNE()
vlm.ts = bh_tsne.fit_transform(vlm.pcs[:, :25])
代码
文本
[21]
# project on vlm.ts
vlm.estimate_transition_prob(hidim="Sx_sz", embed="ts", transform="sqrt", psc=1,
n_neighbors=1000, knn_random=True, sampled_fraction=0.5)
vlm.calculate_embedding_shift(sigma_corr = 0.05, expression_scaling=True)
WARNING:root:Nans encountered in corrcoef and corrected to 1s. If not identical cells were present it is probably a small isolated cluster converging after imputation.
WARNING:root:Nans encountered in corrcoef_random and corrected to 1s. If not identical cells were present it is probably a small isolated cluster converging after imputation.
代码
文本
[22]
import matplotlib.pyplot as plt
vlm.calculate_grid_arrows(smooth=0.8, steps=(40, 40), n_neighbors=300)
plt.figure(None,(20,10))
vlm.plot_grid_arrows(quiver_scale=0.6,
scatter_kwargs_dict={"alpha":0.35, "lw":0.35, "edgecolor":"0.4", "s":38, "rasterized":True}, min_mass=24, angles='xy', scale_units='xy',
headaxislength=2.75, headlength=5, headwidth=4.8, minlength=1.5,
plot_random=True, scale_type="absolute")
代码
文本

参考资料:

  • Gioele La Manno, Ruslan Soldatov, Amit Zeisel, Emelie Braun, Hannah Hochgerner, Viktor Petukhov, Katja Lidschreiber, Maria E. Kastriti, Peter Lönnerberg, Alessandro Furlan, Jean Fan, Lars E. Borm, Zehua Liu, David van Bruggen, Jimin Guo, Xiaoling He, Roger Barker, Erik Sundström, Gonçalo Castelo-Branco, Patrick Cramer, Igor Adameyko, Sten Linnarsson, Peter Kharchenko Nature 2018; doi: 10.1038/s41586-018-0414-6
  • 官方文档:http://velocyto.org/
  • python教程:http://velocyto.org/velocyto.py/index.html
  • 更多实战见:https://github.com/velocyto-team/velocyto-notebooks
代码
文本
生物信息学
RNA velocity
scRNA-seq
python包
R包
命令行
生物信息学RNA velocityscRNA-seqpython包R包命令行
已赞2
本文被以下合集收录
细胞动力学
liyongge
更新于 2024-04-01
6 篇1 人关注
推荐阅读
公开
scATAC数据原始数据预处理流程
生物信息学scATAC
生物信息学scATAC
zhouziyi
发布于 2023-11-12
公开
探索全外显子测序数据分析流程:从数据预处理到变异分析
生物信息学wes
生物信息学wes
孙楠
发布于 2023-09-11
3 赞4 转存文件