摘要
Single-cell RNA sequencing(scRNA-seq)technology has become an effective tool for high-throughout transcriptomic study,which circumvents the averaging artifacts corresponding to bulk RNA-seq technology,yielding new perspectives on the cellular diversity of potential superficially homogeneous populations.Although various sequencing techniques have decreased the amplification bias and improved capture efficiency caused by the low amount of starting material,the technical noise and biological variation are inevitably introduced into experimental process,resulting in high dropout events,which greatly hinder the downstream analysis.Considering the bimodal expression pattern and the right-skewed characteristic existed in normalized scRNA-seq data,we propose a customized autoencoder based on a twopart-generalized-gamma distribution(AE-TPGG)for scRNAseq data analysis,which takes mixed discrete-continuous random variables of scRNA-seq data into account using a twopart model and utilizes the generalized gamma(GG)distribution,for fitting the positive and right-skewed continuous data.The adopted autoencoder enables AE-TPGG to captures the inherent relationship between genes.In addition to the ability of achieving low-dimensional representation,the AETPGG model also provides a denoised imputation according to statistical characteristic of gene expression.Results on real datasets demonstrate that our proposed model is competitive to current imputation methods and ameliorates a diverse set of typical scRNA-seq data analyses.
基金
This research was supported by the National Natural Science Foundation of China(Grant Nos.62136004,61802193)
the National Key R&D Program of China(2018YFC2001600,2018YFC2001602)
the Natural Science Foundation of Jiangsu Province(BK20170934)
the Fundamental Research Funds for the Central Universities(NJ2020023)。