摘要
单细胞核糖核酸(RNA)测序技术被成功应用于产生人体组织和器官的高分辨率细胞图谱,这加深了研究者们对人类疾病组织中细胞异质性的理解。细胞注释是单细胞RNA测序数据分析中非常关键的一步,许多典型的模型利用一个有标签的单细胞参考数据集去注释目标数据集,但目标数据集中部分细胞类型可能不在参考数据集中。整合多个参考数据集可以更好地覆盖目标数据集中的细胞类型,然而多个参考数据集和目标数据集之间存在因测序技术差异等原因造成的批次效应。为此,提出一种基于多源域适应的单细胞分类模型,利用多个已标注细胞类型的参考数据集分别与未标注细胞类型的目标数据集进行对抗训练,消除了批次效应。采用虚拟对抗训练,进一步提升模型预测结果对数据点周围局部微小扰动或噪声的鲁棒性,防止过拟合。在多个单细胞数据集上的实验结果表明,该模型比目前主流模型的细胞识别精度至少提升了5个百分点,为新测序的单细胞身份鉴定提供了新的选择和参考。
Single-cell Ribonucleic Acid(RNA)sequencing technology has proven effective in generating highresolution cell maps of human tissues and organs,thereby enhancing researchers'comprehension of cellular heterogeneity in human disease tissues.Cell annotation stands as a crucial step in single-cell RNA sequencing data analysis.While many conventional models rely on a labeled single-cell reference dataset to annotate the target dataset,certain cell types within the target dataset may not be represented in the reference dataset.Consequently,integrating multiple reference datasets can offer broader coverage of cell types in the target dataset.Nevertheless,batch effects arise between multiple reference datasets and the target dataset due to disparities in sequencing technologies and other factors.To mitigate this issue,this study introduces a single-cell classification model based on multisource domain adaptation.This model leverages multiple reference datasets,each annotated with cell types,to undergo adversarial training with an unlabeled target dataset,thereby mitigating batch effects.Additionally,virtual adversarial training is employed to bolster the model's predictive robustness against minor perturbations or noise around data points,thus preventing overfitting.Experimental findings across multiple singlecell datasets demonstrate that this model enhances cell recognition accuracy by a minimum of 5 percentage points compared to current mainstream models,offering new avenues and benchmarks for identifying newly sequenced single-cell identities.
作者
魏琢艺
罗迈
李文兵
曾远松
余伟江
杨跃东
WEI Zhuoyi;LUO Mai;LI Wenbing;ZENG Yuansong;YU Weijiang;YANG Yuedong(School of Computer Science and Engineering,Sun Yat-Sen University,Guangzhou 510000,Guangdong,China;National Supercomputer Center in Guangzhou,Sun Yat-Sen University,Guangzhou 510000,Guangdong,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2024年第6期48-55,共8页
Computer Engineering
基金
国家重点研发计划(2022YFF1203100)
国家自然科学基金(12126610)。
关键词
单细胞核糖核酸测序
单细胞分类
多源域适应
对抗训练
深度学习
single-cell Ribonucleic Acid(RNA)sequencing
single-cell classification
multisource domain adaptation
adversarial training
deep learning