复杂样本的基因组数据反映了多种细胞类型表达的平均水平。然而,细胞组成的差异会导致许多相关分析结果产生偏差。因此,准确估计细胞组成是分析复杂样本的第一步。目前有许多计算方法已经被开发出来估计复杂样本的细胞组成,但由于缺乏...复杂样本的基因组数据反映了多种细胞类型表达的平均水平。然而,细胞组成的差异会导致许多相关分析结果产生偏差。因此,准确估计细胞组成是分析复杂样本的第一步。目前有许多计算方法已经被开发出来估计复杂样本的细胞组成,但由于缺乏参考数据和先验信息,它们的应用大多有限。针对此问题,本文开发了一种基于最优特征选择的无参考反卷积算法,该算法通过整合一种细胞类型和其他细胞类型之间的跨细胞类型差异表达分析,以及两种细胞类型与其他细胞类型之间的跨细胞类型差异表达分析,迭代地搜索细胞类型特异性的最优特征,并进行细胞组成估计。模拟研究和两个真实数据集上的实证分析表明我们的算法具有更好的性能。Genomic data from complex samples measure average level of multiple cell types, and differences in cell compositions lead to biased results in many relevant analyses. Therefore, accurately estimating cell compositions has been recognized as an important first step in analyzing complex samples. Many computational methods have been developed to estimate cell compositions, but they mostly have limited applications due to a lack of reference or prior information. In this work, we develop a feature selection method for reference-free deconvolution, which iteratively searches for cell-type specific features by cross-cell type differential analysis between one cell type versus the other cell types, as well as between two cell types versus the other cell types, and performs composition estimation. Extensive simulation studies and analyses of two real datasets demonstrate the favorable performance of our proposed method.展开更多
单细胞RNA测序已成为研究生物学重要特征的强大高分辨率工具。然而,其测序条件苛刻,价格成本高昂。目前细胞类型反卷积能够很好地解决这些限制问题,SMCTD(Sparse Model Cell Type Deconvolution)使用稀疏自编码器优化TAPE(Tissue-AdaPti...单细胞RNA测序已成为研究生物学重要特征的强大高分辨率工具。然而,其测序条件苛刻,价格成本高昂。目前细胞类型反卷积能够很好地解决这些限制问题,SMCTD(Sparse Model Cell Type Deconvolution)使用稀疏自编码器优化TAPE(Tissue-AdaPtive autoEncoder),使其在直肠癌和PBMC模拟数据上预测细胞类型比列具有更高的灵敏度、准确性和整体性能,同时在预测细胞类型特异性基因表达上表现更优。展开更多
文摘复杂样本的基因组数据反映了多种细胞类型表达的平均水平。然而,细胞组成的差异会导致许多相关分析结果产生偏差。因此,准确估计细胞组成是分析复杂样本的第一步。目前有许多计算方法已经被开发出来估计复杂样本的细胞组成,但由于缺乏参考数据和先验信息,它们的应用大多有限。针对此问题,本文开发了一种基于最优特征选择的无参考反卷积算法,该算法通过整合一种细胞类型和其他细胞类型之间的跨细胞类型差异表达分析,以及两种细胞类型与其他细胞类型之间的跨细胞类型差异表达分析,迭代地搜索细胞类型特异性的最优特征,并进行细胞组成估计。模拟研究和两个真实数据集上的实证分析表明我们的算法具有更好的性能。Genomic data from complex samples measure average level of multiple cell types, and differences in cell compositions lead to biased results in many relevant analyses. Therefore, accurately estimating cell compositions has been recognized as an important first step in analyzing complex samples. Many computational methods have been developed to estimate cell compositions, but they mostly have limited applications due to a lack of reference or prior information. In this work, we develop a feature selection method for reference-free deconvolution, which iteratively searches for cell-type specific features by cross-cell type differential analysis between one cell type versus the other cell types, as well as between two cell types versus the other cell types, and performs composition estimation. Extensive simulation studies and analyses of two real datasets demonstrate the favorable performance of our proposed method.
文摘单细胞RNA测序已成为研究生物学重要特征的强大高分辨率工具。然而,其测序条件苛刻,价格成本高昂。目前细胞类型反卷积能够很好地解决这些限制问题,SMCTD(Sparse Model Cell Type Deconvolution)使用稀疏自编码器优化TAPE(Tissue-AdaPtive autoEncoder),使其在直肠癌和PBMC模拟数据上预测细胞类型比列具有更高的灵敏度、准确性和整体性能,同时在预测细胞类型特异性基因表达上表现更优。