摘要
为提高分层近邻传播聚类算法处理大规模基因表达数据的精确度,通过使用Pearson系数度量基因表达数据之间的相似性,构建相似性矩阵,在分层近邻传播聚类的自适应阶段加入全局数据信息,提出一种高效的分层近邻传播聚类算法。实验结果表明,与同类算法相比,该算法可以快速完成大规模基因表达数据的聚类,获得较高Silhouette(Sil)及Calinski-Harabasz(CH)指标值的聚类结果。
To improve the accuracy of gene expression data clustering obtained using hierarchical affinity propagation clustering algorithm,Pearson correlation coefficient was used to measure the similarity between gene expression data to construct the similarity matrix,the global information was added into the adapting stage in hierarchical affinity propagation procedure,and an efficient hierarchical affinity propagation algorithm was proposed.Experimental results show that compared with the other existing algorithms,the proposed algorithm can cluster the large-scale gene expression data fast and obtain the clustering results with high Silhouette index and Calinski-Harabasz index.
作者
吴娱
钟诚
尹梦晓
WU Yu ZHONG Cheng YIN Meng-xiao(School of Computer, Electronics and Information, Guangxi University, Nanning 530004, Chin)
出处
《计算机工程与设计》
北大核心
2016年第11期2961-2966,共6页
Computer Engineering and Design
基金
国家自然科学基金项目(61462005)
广西自然科学基金项目(2014XNSFAA118396
2014XNSFAA118361)
关键词
基因表达数据
聚类
分层近邻传播
自适应
全局数据
gene expression data
clustering
hierarchical affinity propagation
adaptation
global data