摘要
复杂样本的基因组数据反映了多种细胞类型表达的平均水平。然而,细胞组成的差异会导致许多相关分析结果产生偏差。因此,准确估计细胞组成是分析复杂样本的第一步。目前有许多计算方法已经被开发出来估计复杂样本的细胞组成,但由于缺乏参考数据和先验信息,它们的应用大多有限。针对此问题,本文开发了一种基于最优特征选择的无参考反卷积算法,该算法通过整合一种细胞类型和其他细胞类型之间的跨细胞类型差异表达分析,以及两种细胞类型与其他细胞类型之间的跨细胞类型差异表达分析,迭代地搜索细胞类型特异性的最优特征,并进行细胞组成估计。模拟研究和两个真实数据集上的实证分析表明我们的算法具有更好的性能。Genomic data from complex samples measure average level of multiple cell types, and differences in cell compositions lead to biased results in many relevant analyses. Therefore, accurately estimating cell compositions has been recognized as an important first step in analyzing complex samples. Many computational methods have been developed to estimate cell compositions, but they mostly have limited applications due to a lack of reference or prior information. In this work, we develop a feature selection method for reference-free deconvolution, which iteratively searches for cell-type specific features by cross-cell type differential analysis between one cell type versus the other cell types, as well as between two cell types versus the other cell types, and performs composition estimation. Extensive simulation studies and analyses of two real datasets demonstrate the favorable performance of our proposed method.
出处
《应用数学进展》
2024年第11期4870-4875,共6页
Advances in Applied Mathematics