Abstract:Differential gene expression analysis based on scRNA-seq data is challenging due to two unique characteristics of scRNA-seq data. First, multimodality and other heterogeneity of the gene expression among different cell conditions lead to divergences in the tail events or crossings of the expression distributions. Second, scRNA-seq data generally have a considerable fraction of dropout events, causing zero inflation in the expression. To account for the first characteristic, existing parametric approaches targeting the mean difference in gene expression are limited, while quantile regression that examines various locations in the distribution will improve the power. However, the second characteristic, zero inflation, makes the traditional quantile regression invalid and underpowered. We propose a quantile-based test that handles the two characteristics, multimodality and zero inflation, simultaneously. The proposed quantile rank-score based test for differential distribution detection (ZIQRank) is derived under a two-part quantile regression model for zero-inflated outcomes. It comprises a test in logistic modeling for the zero counts and a collection of rank-score tests adjusting for zero inflation at multiple prespecified quantiles of the positive part. The testing decision is based on an aggregate result by combining the marginal p-values by MinP or Cauchy procedure. The proposed test is asymptotically justified and evaluated with simulation studies. It shows a higher precision-recall AUC in detecting true differentially expressed genes (DEGs) than the existing methods. We apply the ZIQRank test to a TPM scRNA-seq data on human glioblastoma tumors and exclusively identify a group of DEGs between neoplastic and nonneoplastic cells, which are heterogeneous and have been proved to be associated with glioma. Application to a UMI count scRNA-seq data on cells from mouse intestinal organoids further demonstrates the capability of ZIQRank to improve and complement the existing approaches.

Quantile regression for challenging cases of eQTL mapping

On the Generalized Poisson Regression Mixture Model for Mapping Quantitative Trait Loci with Count Data

From regression rank scores to robust inference for censored quantile regression

eQTL Mapping via Effective SNP Ranking and Screening

Robust identification of regulatory variants (eQTLs) using a differential expression framework developed for RNA-sequencing

Accounting for non-genetic factors by low-rank representation and sparse regression for eQTL mapping.

Bayesian Quantile Regression for Longitudinal Studies with Nonignorable Missing Data.

Sparse Regression Models for Unraveling Group and Individual Associations in Eqtl Mapping

An Information-Theoretic Machine Learning Approach to Expression QTL Analysis

Influence of Outliers on QTL Mapping for Complex Traits

Quantile regression in the field of liver transplantation: A case study-based tutorial

Quantile Regression for Nonignorable Missing Data with Its Application of Analyzing Electronic Medical Records

Functional Mapping of Dynamic Traits with Robust T-Distribution

Distributed Quantile Regression over Sensor Networks

Powerful eQTL mapping through low coverage RNA sequencing

Empirical Likelihood Based Tests for Detecting the Presence of Significant Predictors in Marginal Quantile Regression

Network-based group variable selection for detecting expression quantitative trait loci (eQTL)

Joint eQTL mapping and inference of gene regulatory network improves power of detecting both cis - and trans -eQTLs

Zero-inflated Quantile Rank-Score Based Test (ziqrank) with Application to Scrna-Seq Differential Gene Expression Analysis

Quantile Regression in Longitudinal Studies with Dropouts and Measurement Errors

Quantile regression and empirical likelihood for the analysis of longitudinal data with monotone missing responses due to dropout, with applications to quality of life measurements from clinical trials.