velocyto:揭示 scRNA-seq 数据的 RNA velocity
Velocyto 是一个用于分析 scRNA-seq 数据中表达动态的软件包。特别是,它可以通过区分标准单细胞 RNA 测序方案中未剪接和剪接的 mRNA 来估计单细胞的 RNA 速度。
原始论文:La Manno, Gioele, et al. RNA velocity of single cells. Nature 560.7719 (2018): 494-498. https://doi.org/10.1038/s41586-018-0414-6.
转录动力学模型:
RNA velocity 理论描述:
📖 上手指南
本文档可在 Bohrium Notebook 上直接运行。你可以点击界面上方按钮 开始连接,选择 `bohrium-notebook:2023-03-26` 镜像和 `c4_m16_cpu` 节点配置,稍等片刻选择 `R kernel` 即可运行。
一、安装 velocyto
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Collecting velocyto Downloading https://pypi.tuna.tsinghua.edu.cn/packages/81/66/e8fff9d3b824fd99c0f678c47c740fec058ce2d9a0cfdf11b114ea8889f2/velocyto-0.17.17.tar.gz (198 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 198.9/198.9 kB 1.4 MB/s eta 0:00:00a 0:00:01 Preparing metadata (setup.py) ... done Requirement already satisfied: numpy in /opt/conda/lib/python3.8/site-packages (from velocyto) (1.23.5) Requirement already satisfied: scipy in /opt/conda/lib/python3.8/site-packages (from velocyto) (1.7.3) Requirement already satisfied: cython in /opt/conda/lib/python3.8/site-packages (from velocyto) (0.29.33) Requirement already satisfied: numba in /opt/conda/lib/python3.8/site-packages (from velocyto) (0.56.4) Requirement already satisfied: matplotlib in /opt/conda/lib/python3.8/site-packages (from velocyto) (3.7.1) Requirement already satisfied: scikit-learn in /opt/conda/lib/python3.8/site-packages (from velocyto) (1.0.2) Requirement already satisfied: h5py in /opt/conda/lib/python3.8/site-packages (from velocyto) (3.1.0) Collecting loompy Downloading https://pypi.tuna.tsinghua.edu.cn/packages/f0/e3/8dc87471b34bc0db4e72f51a7aa0b454b3b9d551e15900862c022050aca3/loompy-3.0.7.tar.gz (4.8 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.8/4.8 MB 20.2 MB/s eta 0:00:00a 0:00:01 Preparing metadata (setup.py) ... done Collecting pysam Downloading https://pypi.tuna.tsinghua.edu.cn/packages/32/d1/a2d1cebe6c4f3acaf973d1fedcf7bf29209a4a4e8d5a9f2b9c5b20a2bcad/pysam-0.22.0-cp38-cp38-manylinux_2_28_x86_64.whl (24.4 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.4/24.4 MB 21.4 MB/s eta 0:00:0000:0100:01 Requirement already satisfied: Click in /opt/conda/lib/python3.8/site-packages (from velocyto) (7.1.2) Requirement already satisfied: pandas in /opt/conda/lib/python3.8/site-packages (from velocyto) (1.5.3) Requirement already satisfied: setuptools in /opt/conda/lib/python3.8/site-packages (from loompy->velocyto) (65.6.3) Collecting numpy-groupies Downloading https://pypi.tuna.tsinghua.edu.cn/packages/17/e0/1b8ed88ed696cf5710fc6d28f33a651fad9bdf183c3012eedd757e461be4/numpy_groupies-0.9.22.tar.gz (53 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.3/53.3 kB 15.3 MB/s eta 0:00:00 Preparing metadata (setup.py) ... done Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (0.11.0) Requirement already satisfied: importlib-resources>=3.2.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (5.2.0) Requirement already satisfied: pillow>=6.2.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (9.4.0) Requirement already satisfied: pyparsing>=2.3.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (3.0.9) Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (1.0.5) Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (1.4.4) Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (2.8.2) Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (4.38.0) Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib->velocyto) (23.0) Requirement already satisfied: importlib-metadata in /opt/conda/lib/python3.8/site-packages (from numba->velocyto) (6.0.0) Requirement already satisfied: llvmlite<0.40,>=0.39.0dev0 in /opt/conda/lib/python3.8/site-packages (from numba->velocyto) (0.39.1) Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.8/site-packages (from pandas->velocyto) (2022.7) Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.8/site-packages (from scikit-learn->velocyto) (3.1.0) Requirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.8/site-packages (from scikit-learn->velocyto) (1.2.0) Collecting numpy Downloading https://pypi.tuna.tsinghua.edu.cn/packages/2f/14/abc14a3f3663739e5d3c8fd980201d10788d75fea5b0685734227052c4f0/numpy-1.22.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.9/16.9 MB 28.7 MB/s eta 0:00:0000:0100:01 Requirement already satisfied: zipp>=3.1.0 in /opt/conda/lib/python3.8/site-packages (from importlib-resources>=3.2.0->matplotlib->velocyto) (3.14.0) Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib->velocyto) (1.16.0) Building wheels for collected packages: velocyto, loompy, numpy-groupies Building wheel for velocyto (setup.py) ... done Created wheel for velocyto: filename=velocyto-0.17.17-cp38-cp38-linux_x86_64.whl size=523758 sha256=e636d559adc3ed168d936689b88531732dae8e44a2efc7a3a7242c50d5b74a87 Stored in directory: /root/.cache/pip/wheels/c1/3c/6e/68cd9bcfab44bc727185de0d03b2dfba2eef788e32d3b8d30f Building wheel for loompy (setup.py) ... done Created wheel for loompy: filename=loompy-3.0.7-py3-none-any.whl size=52018 sha256=7ec9292930abf525f221dd319d8d2b5306baa415b3aaca76d8f4b4530bbde3c9 Stored in directory: /root/.cache/pip/wheels/bc/e7/b6/94504a02721c11aabf8124040e98ccdf78dee462ad3e7c78d2 Building wheel for numpy-groupies (setup.py) ... done Created wheel for numpy-groupies: filename=numpy_groupies-0.9.22-py3-none-any.whl size=25846 sha256=b63872d7266b3cd9781e7796da5149f9e071fbd770943fc8cf328e9a65ff163f Stored in directory: /root/.cache/pip/wheels/88/a5/bc/0c26522b97154b79bcce8dfd11ebb1e043a3a1ecfeb3a0ed69 Successfully built velocyto loompy numpy-groupies Installing collected packages: pysam, numpy, numpy-groupies, loompy, velocyto Attempting uninstall: numpy Found existing installation: numpy 1.23.5 Uninstalling numpy-1.23.5: Successfully uninstalled numpy-1.23.5 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. moviepy 0.2.3.5 requires decorator<5.0,>=4.0.2, but you have decorator 5.1.1 which is incompatible. cvxpy 1.2.3 requires setuptools<=64.0.2, but you have setuptools 65.6.3 which is incompatible. Successfully installed loompy-3.0.7 numpy-1.22.4 numpy-groupies-0.9.22 pysam-0.22.0 velocyto-0.17.17 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
二、生成分析所需要的.loom格式的文件
velocyto 支持命令行方式从.bam/.sam文件
生成 spliced/unspliced 计数矩阵的.loom文件
,具体使用方式见:http://velocyto.org/velocyto.py/tutorial/cli.html#introduction。
本文档用现成的.loom格式文件:
('hgForebrainGlut.loom', <http.client.HTTPMessage at 0x7fcbcb1cd490>)
Dentate Gyrus(颗粒回)是大脑海马回中的一个结构,是海马的一部分。海马是位于大脑内部、与记忆和空间导航相关的重要区域。颗粒回在海马回中位于海马回旁支,并涉及到神经元的生成和神经元连接的重要过程。对Dentate Gyrus的研究有助于理解大脑的结构和功能,以及与学习和记忆相关的神经生物学机制。
三、估计 RNA velocity
1. 加载.loom文件
['A', 'S', 'U', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_normalize_S', '_normalize_Sx', '_normalize_U', '_normalize_Ux', '_perform_PCA_imputed', '_plot_pca_imputed', '_plot_phase_portrait', 'adjust_totS_totU', 'ca', 'calculate_embedding_shift', 'calculate_grid_arrows', 'calculate_shift', 'calculate_velocity', 'cluster_ix', 'cluster_uid', 'custom_filter_attributes', 'default_filter_and_norm', 'default_fit_preparation', 'estimate_transition_prob', 'extrapolate_cell_at_t', 'filter_cells', 'filter_genes', 'filter_genes_by_phase_portrait', 'filter_genes_good_fit', 'fit_gammas', 'gene_knn_imputation', 'initial_Ucell_size', 'initial_cell_size', 'knn_imputation', 'knn_imputation_precomputed', 'loom_filepath', 'normalize', 'normalize_by_size_factor', 'normalize_by_total', 'normalize_median', 'perform_PCA', 'perform_TSNE', 'plot_arrows_embedding', 'plot_cell_transitions', 'plot_expression_as_color', 'plot_fractions', 'plot_grid_arrows', 'plot_pca', 'plot_phase_portraits', 'plot_velocity_as_color', 'predict_U', 'prepare_markov', 'ra', 'reload_raw', 'robust_size_factor', 'run_markov', 'score_cluster_expression', 'score_cv_vs_mean', 'score_detection_levels', 'set_clusters', 'to_hdf5']
数据矩阵的形状: (32738, 1720) loom文件中的基因信息: {'Accession': array(['ENSG00000237613', 'ENSG00000238009', 'ENSG00000239945', ..., 'ENSG00000240450', 'ENSG00000172288', 'ENSG00000231141'], dtype=object), 'Chromosome': array(['1', '1', '1', ..., 'Y', 'Y', 'Y'], dtype=object), 'End': array([ 36081, 133566, 91105, ..., 27632852, 27771049, 27879535]), 'Gene': array(['FAM138A', 'RP11-34P13.7', 'RP11-34P13.8', ..., 'CSPG4P1Y', 'CDY1', 'TTTY3'], dtype=object), 'Start': array([ 34554, 89295, 89551, ..., 27629055, 27768264, 27874637]), 'Strand': array(['-', '-', '-', ..., '+', '+', '+'], dtype=object)} loom文件中的细胞信息: {'CellID': array(['10X_17_028:AACCATGGTAATCACCx', '10X_17_028:AACCATGCATACTACGx', '10X_17_028:AAACCTGGTAAAGGAGx', ..., '10X_17_029:TTTGGTTGTACCCAATx', '10X_17_029:TTTCCTCCAGTCCTTCx', '10X_17_029:TTTGCGCCACAGATTCx'], dtype=object), 'Clusters': array([3, 3, 1, ..., 6, 0, 1])}
array([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]])
2. 数据质控
3. Gamma拟合
For the preparation of the gamma fit we smooth the data using a kNN neighbors pooling approach. kNN neighbors can be calculated directly in gene expression space or reduced PCA space, using either correlation distance or euclidean distance. One example of set of parameters is provided below.
1711
4. Gamma 拟合和extrapolation
The calculate velocity and extrapolate the future state of the cells:
5. 投影
WARNING:root:Nans encountered in corrcoef and corrected to 1s. If not identical cells were present it is probably a small isolated cluster converging after imputation. WARNING:root:Nans encountered in corrcoef_random and corrected to 1s. If not identical cells were present it is probably a small isolated cluster converging after imputation.
参考资料:
- Gioele La Manno, Ruslan Soldatov, Amit Zeisel, Emelie Braun, Hannah Hochgerner, Viktor Petukhov, Katja Lidschreiber, Maria E. Kastriti, Peter Lönnerberg, Alessandro Furlan, Jean Fan, Lars E. Borm, Zehua Liu, David van Bruggen, Jimin Guo, Xiaoling He, Roger Barker, Erik Sundström, Gonçalo Castelo-Branco, Patrick Cramer, Igor Adameyko, Sten Linnarsson, Peter Kharchenko Nature 2018; doi: 10.1038/s41586-018-0414-6
- 官方文档:http://velocyto.org/
- python教程:http://velocyto.org/velocyto.py/index.html
- 更多实战见:https://github.com/velocyto-team/velocyto-notebooks