Single-cell RNA-seq (scRNA-seq) allows the analysis of gene expression in each cell, which enables the detection of highly variable genes (HVG) that contribute to cell-to-cell variation within a homogeneous cell popul...Single-cell RNA-seq (scRNA-seq) allows the analysis of gene expression in each cell, which enables the detection of highly variable genes (HVG) that contribute to cell-to-cell variation within a homogeneous cell population. HVG detection is necessary for clustering analysis to improve the clustering result. scRNA-seq includes some genes that are expressed with a certain probability in all cells which make the cells indistinguishable. These genes are referred to as background noise. To remove the background noise and select the informative genes for clustering analysis, in this paper, we propose an effective HVG detection method based on principal component analysis (PCA). The proposed method utilizes PCA to evaluate the genes (features) on the sample space. The distortion-free principal components are selected to calculate the distance from the origin to gene as the weight of each gene. The genes that have the greatest distances to the origin are selected for clustering analysis. Experimental results on both synthetic and gene expression datasets show that the proposed method not only removes the background noise to select the informative genes for clustering analysis, but also outperforms the existing HVG detection methods.展开更多
基金supported in part by the New Energy and Industrial Technology Development Organization (AJD30064) and JST COI-NEXT.
文摘Single-cell RNA-seq (scRNA-seq) allows the analysis of gene expression in each cell, which enables the detection of highly variable genes (HVG) that contribute to cell-to-cell variation within a homogeneous cell population. HVG detection is necessary for clustering analysis to improve the clustering result. scRNA-seq includes some genes that are expressed with a certain probability in all cells which make the cells indistinguishable. These genes are referred to as background noise. To remove the background noise and select the informative genes for clustering analysis, in this paper, we propose an effective HVG detection method based on principal component analysis (PCA). The proposed method utilizes PCA to evaluate the genes (features) on the sample space. The distortion-free principal components are selected to calculate the distance from the origin to gene as the weight of each gene. The genes that have the greatest distances to the origin are selected for clustering analysis. Experimental results on both synthetic and gene expression datasets show that the proposed method not only removes the background noise to select the informative genes for clustering analysis, but also outperforms the existing HVG detection methods.