期刊文献+

联合ZINB模型与图注意力自编码器的自优化单细胞聚类

Self-optimized Single Cell Clustering Using ZINB Model and Graph Attention Autoencoder
下载PDF
导出
摘要 单细胞数据聚类在生物信息分析中具有重要作用,但受测序原理和测序平台的限制,单细胞数据集普遍存在高维稀疏性、高方差噪声和基因数据缺失的问题,导致单细胞数据在聚类分析和应用方面仍面临诸多挑战。现有的单细胞聚类方法主要针对细胞和基因表达间的关系进行建模,忽略了对细胞间潜在特征关系的充分挖掘以及对噪声的去除,导致聚类结果不理想,从而阻碍了后期对数据的分析。针对上述问题,提出了一种联合零膨胀负二项(Zero Inflated Negative Binomial,ZINB)模型与图注意力自编码器的自优化单细胞聚类算法(Self-optimized Single Cell Clustering Using ZINB Model and Graph Attention Autoencoder,scZDGAC)。该算法首先使用ZINB模型并结合可扩展的DCA去噪算法,通过ZINB分布更好地拟合数据特征分布,提升自编码器的去噪性能,并减小噪声和数据丢失对KNN算法输出的影响;然后通过图注意力自编码器在不同权重的细胞之间传播信息,更好地捕获细胞间的潜在特征进行聚类;最后scZDGAC采用自优化的方法使原本两个独立的聚类模块和特征模块相互受益,不断迭代更新聚类中心,进一步提升聚类性能。为了对聚类结果进行评价,文中使用调整兰德指数(ARI)和标准化互信息(NMI)两个通用评价指标。在6个不同规模的单细胞数据集上与其他算法进行对比实验,结果表明,所提聚类算法在聚类性能上较其他方法有很大提高,很好地展现了该算法的鲁棒性。 One of the most important aspects of single-cell data analysis is the clustering of individual cells into clusters of subpopulations.However,due to the limitation of sequencing principle and sequencing platform,the obtained single cell dataset ge-nerally has high-dimensional sparsity,high variance noise and a large amount of data loss,which lead to many challenges in cluster analysis and application of single cell data.Single-cell clustering methods proposed in recent years mainly model the relationship between cell and gene expression,ignoring the full mining of the potential characteristic relationship between cells and the remo-val of noise,resulting in unsatisfactory clustering results,which hinders the later analysis of data.In view of the above problems,a self-optimized single-cell clustering algorithm(scZDGAC)combining zero expansion negative binomial(ZINB)model with graph attention autoencoder is proposed.The algorithm firstly uses ZINB model combined with extensible DCA denoising algorithm,better fit data feature distribution through ZINB distribution,to improve the denoising performance of autoencoder,and reduce the impact of noise and data loss on the output of KNN algorithm.And then using the graph attention autoencoder to spread the information between cells of different weights,which can better capture the potential features between cells for clustering.Finally,scZDGAC uses the self-optimization method to make the originally two independent clustering modules and feature modules benefit from each other,and constantly update the clustering center iteratively to further improve the clustering performance.In order to evaluate the clustering results,this paper uses adjusted RAND index(ARI)and standardized mutual information(NMI)as two general evaluation indicators.Compared with six single cell datasets of different scales,experimental results show that the clustering performance of the proposed clustering algorithm has greatly improved.
作者 孔凤玲 吴昊 董庆庆 KONG Fengling;WU Hao;DONG Qingqing(School of Information Science and Engineering,Yunnan University,Kunming 650500,China)
出处 《计算机科学》 CSCD 北大核心 2023年第12期104-112,共9页 Computer Science
基金 国家自然科学基金(62061049) 云南省基础研究项目(2018FB100)。
关键词 深度聚类 scRNA-Seq ZINB模型 自优化 DCA 图注意力自编码器 Deep clustering scRNA-seq ZINB model Self-optimization DCA Graph attention autoencoder
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部