空间站广场

论文

Notebooks

比赛

课程

Apps

我的主页

我的Notebooks

我的论文库

我的足迹

我的工作空间

任务

节点

文件

数据集

镜像

项目

数据库

公开

velocyto：揭示 scRNA-seq 数据的 RNA velocity

生物信息学

RNA velocity

scRNA-seq

python包

R包

命令行

生物信息学RNA velocityscRNA-seqpython包R包命令行

孙楠

发布于 2023-11-22

推荐镜像 :Basic Image:bohrium-notebook:2023-03-26

推荐机型 :c4_m16_cpu

velocyto：揭示 scRNA-seq 数据的 RNA velocity

一、安装 velocyto

二、生成分析所需要的.loom格式的文件

三、估计 RNA velocity

1. 加载.loom文件

2. 数据质控

3. Gamma拟合

4. Gamma 拟合和extrapolation

5. 投影

参考资料：

velocyto：揭示 scRNA-seq 数据的 RNA velocity

代码

文本

Velocyto 是一个用于分析 scRNA-seq 数据中表达动态的软件包。特别是，它可以通过区分标准单细胞 RNA 测序方案中未剪接和剪接的 mRNA 来估计单细胞的 RNA 速度。

原始论文：La Manno, Gioele, et al. RNA velocity of single cells. Nature 560.7719 (2018): 494-498. https://doi.org/10.1038/s41586-018-0414-6.

转录动力学模型：

截屏2023-11-22 下午4.14.45.png

RNA velocity 理论描述：

截屏2023-11-22 下午4.11.05.png

截屏2023-11-22 下午4.11.26.png

代码

文本

📖 上手指南
本文档可在 Bohrium Notebook 上直接运行。你可以点击界面上方按钮开始连接，选择 `bohrium-notebook:2023-03-26` 镜像和 `c4_m16_cpu` 节点配置，稍等片刻选择 `R kernel` 即可运行。

代码

文本

一、安装 velocyto

代码

文本

[1]

!pip install velocyto

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting velocyto
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/81/66/e8fff9d3b824fd99c0f678c47c740fec058ce2d9a0cfdf11b114ea8889f2/velocyto-0.17.17.tar.gz (198 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 198.9/198.9 kB 1.4 MB/s eta 0:00:00a 0:00:01
  Preparing metadata (setup.py) ... done
Requirement already satisfied: numpy in /opt/conda/lib/python3.8/site-packages (from velocyto) (1.23.5)
Requirement already satisfied: scipy in /opt/conda/lib/python3.8/site-packages (from velocyto) (1.7.3)
Requirement already satisfied: cython in /opt/conda/lib/python3.8/site-packages (from velocyto) (0.29.33)
Requirement already satisfied: numba in /opt/conda/lib/python3.8/site-packages (from velocyto) (0.56.4)
Requirement already satisfied: matplotlib in /opt/conda/lib/python3.8/site-packages (from velocyto) (3.7.1)
Requirement already satisfied: scikit-learn in /opt/conda/lib/python3.8/site-packages (from velocyto) (1.0.2)
Requirement already satisfied: h5py in /opt/conda/lib/python3.8/site-packages (from velocyto) (3.1.0)
Collecting loompy
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/f0/e3/8dc87471b34bc0db4e72f51a7aa0b454b3b9d551e15900862c022050aca3/loompy-3.0.7.tar.gz (4.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.8/4.8 MB 20.2 MB/s eta 0:00:00a 0:00:01
  Preparing metadata (setup.py) ... done
Collecting pysam
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/32/d1/a2d1cebe6c4f3acaf973d1fedcf7bf29209a4a4e8d5a9f2b9c5b20a2bcad/pysam-0.22.0-cp38-cp38-manylinux_2_28_x86_64.whl (24.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.4/24.4 MB 21.4 MB/s eta 0:00:0000:0100:01
Requirement already satisfied: Click in /opt/conda/lib/python3.8/site-packages (from velocyto) (7.1.2)
Requirement already satisfied: pandas in /opt/conda/lib/python3.8/site-packages (from velocyto) (1.5.3)
Requirement already satisfied: setuptools in /opt/conda/lib/python3.8/site-packages (from loompy->velocyto) (65.6.3)
Collecting numpy-groupies
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/17/e0/1b8ed88ed696cf5710fc6d28f33a651fad9bdf183c3012eedd757e461be4/numpy_groupies-0.9.22.tar.gz (53 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.3/53.3 kB 15.3 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (0.11.0)
Requirement already satisfied: importlib-resources>=3.2.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (5.2.0)
Requirement already satisfied: pillow>=6.2.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (9.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (3.0.9)
Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (1.0.5)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (1.4.4)
Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (2.8.2)
Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (4.38.0)
Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (23.0)
Requirement already satisfied: importlib-metadata in /opt/conda/lib/python3.8/site-packages (from numba->velocyto) (6.0.0)
Requirement already satisfied: llvmlite<0.40,>=0.39.0dev0 in /opt/conda/lib/python3.8/site-packages (from numba->velocyto) (0.39.1)
Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.8/site-packages (from pandas->velocyto) (2022.7)
Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.8/site-packages (from scikit-learn->velocyto) (3.1.0)
Requirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.8/site-packages (from scikit-learn->velocyto) (1.2.0)
Collecting numpy
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/2f/14/abc14a3f3663739e5d3c8fd980201d10788d75fea5b0685734227052c4f0/numpy-1.22.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.9/16.9 MB 28.7 MB/s eta 0:00:0000:0100:01
Requirement already satisfied: zipp>=3.1.0 in /opt/conda/lib/python3.8/site-packages (from importlib-resources>=3.2.0->matplotlib->velocyto) (3.14.0)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib->velocyto) (1.16.0)
Building wheels for collected packages: velocyto, loompy, numpy-groupies
  Building wheel for velocyto (setup.py) ... done
  Created wheel for velocyto: filename=velocyto-0.17.17-cp38-cp38-linux_x86_64.whl size=523758 sha256=e636d559adc3ed168d936689b88531732dae8e44a2efc7a3a7242c50d5b74a87
  Stored in directory: /root/.cache/pip/wheels/c1/3c/6e/68cd9bcfab44bc727185de0d03b2dfba2eef788e32d3b8d30f
  Building wheel for loompy (setup.py) ... done
  Created wheel for loompy: filename=loompy-3.0.7-py3-none-any.whl size=52018 sha256=7ec9292930abf525f221dd319d8d2b5306baa415b3aaca76d8f4b4530bbde3c9
  Stored in directory: /root/.cache/pip/wheels/bc/e7/b6/94504a02721c11aabf8124040e98ccdf78dee462ad3e7c78d2
  Building wheel for numpy-groupies (setup.py) ... done
  Created wheel for numpy-groupies: filename=numpy_groupies-0.9.22-py3-none-any.whl size=25846 sha256=b63872d7266b3cd9781e7796da5149f9e071fbd770943fc8cf328e9a65ff163f
  Stored in directory: /root/.cache/pip/wheels/88/a5/bc/0c26522b97154b79bcce8dfd11ebb1e043a3a1ecfeb3a0ed69
Successfully built velocyto loompy numpy-groupies
Installing collected packages: pysam, numpy, numpy-groupies, loompy, velocyto
  Attempting uninstall: numpy
    Found existing installation: numpy 1.23.5
    Uninstalling numpy-1.23.5:
      Successfully uninstalled numpy-1.23.5
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
moviepy 0.2.3.5 requires decorator<5.0,>=4.0.2, but you have decorator 5.1.1 which is incompatible.
cvxpy 1.2.3 requires setuptools<=64.0.2, but you have setuptools 65.6.3 which is incompatible.
Successfully installed loompy-3.0.7 numpy-1.22.4 numpy-groupies-0.9.22 pysam-0.22.0 velocyto-0.17.17
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

代码

文本

二、生成分析所需要的.loom格式的文件

代码

文本

velocyto 支持命令行方式从.bam/.sam文件生成 spliced/unspliced 计数矩阵的.loom文件，具体使用方式见：http://velocyto.org/velocyto.py/tutorial/cli.html#introduction。

本文档用现成的.loom格式文件：

代码

文本

[2]

from urllib.request import urlretrieve

urlretrieve("http://pklab.med.harvard.edu/velocyto/hgForebrainGlut/hgForebrainGlut.loom", "hgForebrainGlut.loom")

('hgForebrainGlut.loom', <http.client.HTTPMessage at 0x7fcbcb1cd490>)

代码

文本

Dentate Gyrus（颗粒回）是大脑海马回中的一个结构，是海马的一部分。海马是位于大脑内部、与记忆和空间导航相关的重要区域。颗粒回在海马回中位于海马回旁支，并涉及到神经元的生成和神经元连接的重要过程。对Dentate Gyrus的研究有助于理解大脑的结构和功能，以及与学习和记忆相关的神经生物学机制。

代码

文本

三、估计 RNA velocity

代码

文本

[3]

import velocyto as vcy

import warnings

warnings.filterwarnings("ignore")

代码

文本

1. 加载.loom文件

代码

文本

[4]

vlm = vcy.VelocytoLoom("hgForebrainGlut.loom")

代码

文本

[5]

# 查看 vlm 对象的所有属性和方法

dir(vlm)

['A',
 'S',
 'U',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_normalize_S',
 '_normalize_Sx',
 '_normalize_U',
 '_normalize_Ux',
 '_perform_PCA_imputed',
 '_plot_pca_imputed',
 '_plot_phase_portrait',
 'adjust_totS_totU',
 'ca',
 'calculate_embedding_shift',
 'calculate_grid_arrows',
 'calculate_shift',
 'calculate_velocity',
 'cluster_ix',
 'cluster_uid',
 'custom_filter_attributes',
 'default_filter_and_norm',
 'default_fit_preparation',
 'estimate_transition_prob',
 'extrapolate_cell_at_t',
 'filter_cells',
 'filter_genes',
 'filter_genes_by_phase_portrait',
 'filter_genes_good_fit',
 'fit_gammas',
 'gene_knn_imputation',
 'initial_Ucell_size',
 'initial_cell_size',
 'knn_imputation',
 'knn_imputation_precomputed',
 'loom_filepath',
 'normalize',
 'normalize_by_size_factor',
 'normalize_by_total',
 'normalize_median',
 'perform_PCA',
 'perform_TSNE',
 'plot_arrows_embedding',
 'plot_cell_transitions',
 'plot_expression_as_color',
 'plot_fractions',
 'plot_grid_arrows',
 'plot_pca',
 'plot_phase_portraits',
 'plot_velocity_as_color',
 'predict_U',
 'prepare_markov',
 'ra',
 'reload_raw',
 'robust_size_factor',
 'run_markov',
 'score_cluster_expression',
 'score_cv_vs_mean',
 'score_detection_levels',
 'set_clusters',
 'to_hdf5']

代码

文本

[6]

print('数据矩阵的形状:\n', vlm.S.shape, '\n')

print('loom文件中的基因信息:\n', vlm.ra,'\n')

print('loom文件中的细胞信息:\n', vlm.ca)

数据矩阵的形状:
 (32738, 1720) 

loom文件中的基因信息:
 {'Accession': array(['ENSG00000237613', 'ENSG00000238009', 'ENSG00000239945', ...,
       'ENSG00000240450', 'ENSG00000172288', 'ENSG00000231141'],
      dtype=object), 'Chromosome': array(['1', '1', '1', ..., 'Y', 'Y', 'Y'], dtype=object), 'End': array([   36081,   133566,    91105, ..., 27632852, 27771049, 27879535]), 'Gene': array(['FAM138A', 'RP11-34P13.7', 'RP11-34P13.8', ..., 'CSPG4P1Y', 'CDY1',
       'TTTY3'], dtype=object), 'Start': array([   34554,    89295,    89551, ..., 27629055, 27768264, 27874637]), 'Strand': array(['-', '-', '-', ..., '+', '+', '+'], dtype=object)} 

loom文件中的细胞信息:
 {'CellID': array(['10X_17_028:AACCATGGTAATCACCx', '10X_17_028:AACCATGCATACTACGx',
       '10X_17_028:AAACCTGGTAAAGGAGx', ...,
       '10X_17_029:TTTGGTTGTACCCAATx', '10X_17_029:TTTCCTCCAGTCCTTCx',
       '10X_17_029:TTTGCGCCACAGATTCx'], dtype=object), 'Clusters': array([3, 3, 1, ..., 6, 0, 1])}

代码

文本

[7]

# mormalization

vlm.normalize("S", size=True, log=True)

vlm.S_norm # contains log normalized

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

代码

文本

[8]

# spliced/unspliced fractions

vlm.plot_fractions()

代码

文本

2. 数据质控

代码

文本

[9]

# remove the cells with extremely low unspliced detection

import numpy as np

vlm.filter_cells(bool_array=vlm.initial_Ucell_size > np.percentile(vlm.initial_Ucell_size, 0.5))

代码

文本

[10]

# select relevant features for the downstream analysis

# make velocyto aware of the clusters annotation

vlm.set_clusters(vlm.ca["Clusters"])

代码

文本

[11]

# using the clustering annotation select the genes that are expressed above a threshold of total number of molecules in any of the clusters

vlm.score_detection_levels(min_expr_counts=40, min_cells_express=30)

vlm.filter_genes(by_detection_levels=True)

代码

文本

[12]

# perform feature selection

vlm.score_cv_vs_mean(3000, plot=True, max_expr_avg=35)

vlm.filter_genes(by_cv_vs_mean=True)

代码

文本

[13]

# normalize our data by size (total molecule count)

vlm._normalize_S(relative_size=vlm.S.sum(0),

target_size=vlm.S.sum(0).mean())

vlm._normalize_U(relative_size=vlm.U.sum(0),

target_size=vlm.U.sum(0).mean())

代码

文本

3. Gamma拟合

For the preparation of the gamma fit we smooth the data using a kNN neighbors pooling approach. kNN neighbors can be calculated directly in gene expression space or reduced PCA space, using either correlation distance or euclidean distance. One example of set of parameters is provided below.

代码

文本

[14]

len(vlm.ca['CellID'])

代码

文本

[15]

vlm.perform_PCA()

vlm.knn_imputation(n_pca_dims=20, k=500, balanced=True, b_sight=1000, b_maxl=1500, n_jobs=2)

代码

文本

4. Gamma 拟合和extrapolation

代码

文本

[16]

# fit gamma to every gene

vlm.fit_gammas()

代码

文本

[17]

# visualize

vlm.plot_phase_portraits(['HES4', 'HES5'])

代码

文本

The calculate velocity and extrapolate the future state of the cells:

代码

文本

[18]

# The calculate velocity and extrapolate the future state of the cells:

vlm.predict_U()

vlm.calculate_velocity()

vlm.calculate_shift(assumption="constant_velocity")

vlm.extrapolate_cell_at_t(delta_t=1.)

代码

文本

[19]

# In alternative extrapolation can be performed using the constant unspliced assumption (for more information consult our preprint)

vlm.calculate_shift(assumption="constant_unspliced", delta_t=10)

vlm.extrapolate_cell_at_t(delta_t=1.)

代码

文本

5. 投影

代码

文本

[20]

from sklearn.manifold import TSNE

bh_tsne = TSNE()

vlm.ts = bh_tsne.fit_transform(vlm.pcs[:, :25])

代码

文本

[21]

# project on vlm.ts

vlm.estimate_transition_prob(hidim="Sx_sz", embed="ts", transform="sqrt", psc=1,

n_neighbors=1000, knn_random=True, sampled_fraction=0.5)

vlm.calculate_embedding_shift(sigma_corr = 0.05, expression_scaling=True)

WARNING:root:Nans encountered in corrcoef and corrected to 1s. If not identical cells were present it is probably a small isolated cluster converging after imputation.
WARNING:root:Nans encountered in corrcoef_random and corrected to 1s. If not identical cells were present it is probably a small isolated cluster converging after imputation.

代码

文本

[22]

import matplotlib.pyplot as plt

vlm.calculate_grid_arrows(smooth=0.8, steps=(40, 40), n_neighbors=300)

plt.figure(None,(20,10))

vlm.plot_grid_arrows(quiver_scale=0.6,

scatter_kwargs_dict={"alpha":0.35, "lw":0.35, "edgecolor":"0.4", "s":38, "rasterized":True}, min_mass=24, angles='xy', scale_units='xy',

headaxislength=2.75, headlength=5, headwidth=4.8, minlength=1.5,

plot_random=True, scale_type="absolute")

代码

文本

参考资料：

Gioele La Manno, Ruslan Soldatov, Amit Zeisel, Emelie Braun, Hannah Hochgerner, Viktor Petukhov, Katja Lidschreiber, Maria E. Kastriti, Peter Lönnerberg, Alessandro Furlan, Jean Fan, Lars E. Borm, Zehua Liu, David van Bruggen, Jimin Guo, Xiaoling He, Roger Barker, Erik Sundström, Gonçalo Castelo-Branco, Patrick Cramer, Igor Adameyko, Sten Linnarsson, Peter Kharchenko Nature 2018; doi: 10.1038/s41586-018-0414-6
官方文档：http://velocyto.org/
python教程：http://velocyto.org/velocyto.py/index.html
更多实战见：https://github.com/velocyto-team/velocyto-notebooks

代码

文本

生物信息学

RNA velocity

scRNA-seq

python包

R包

命令行

生物信息学RNA velocityscRNA-seqpython包R包命令行

已赞2

本文被以下合集收录

细胞动力学

liyongge

更新于 2024-04-01

6 篇1 人关注