期刊文献+

SIAP:一种蛋白质复合物识别分布式算法

SIAP:a new distributed algorithm for protein complex identification
下载PDF
导出
摘要 针对AP算法运算时间消耗过高,相似性矩阵参考度值影响聚类效果等问题,本文提出了一种基于Spark改进的AP算法,首先对无权的数据集应用融合的ECC(边聚集系数)和CD算法进行加权处理,并根据加权的结果设置相似性矩阵的参考度提高聚类精度,并在Spark平台并行化改进AP算法减少运算时间。应用PPI数据,识别蛋白质复合物,并引入F值聚类评价指标对结果进行比较,实验结果表明:该算法在不同的PPI网络上均有较高的聚类精度优于clusterone等经典的聚类算法,并且提高了运行效率,有良好的扩展性。 AP has a high computational time complexity and the similarity matrix reference value affects the clustering effect.In response to these problems,this paper proposes an improved AP algorithm based on Spark(SIAP).First,the unweighted data set are weighted by ECC(Edge Clustering Coefficient)and CD algorithms,to improve clustering accuracy.The reference degree of the similarity matrix is set according to the weighted result,and parallel the improved AP algorithm on spark platform to reduce running time.PPI(Protein-Protein Interaction)data is used to identify the protein complexes,and the F-Measure clustering evaluation index is introduced to compare the results.The experimental results show that the algorithm has higher clustering accuracy on different PPI networks.It is superior to clusterone and other classical clustering algorithms,and it improves the operating efficiency with good scalability.
作者 邓超 刘桂霞 孙立岩 王荣全 DENG Chao;LIU Guixia;SUN Liyan;WANG Rongquan(College of Software, Jilin University, Changchun 130012, China;College of Computer Science and Technology, Jilin University, Changchun 130012, China)
出处 《哈尔滨工程大学学报》 EI CAS CSCD 北大核心 2020年第11期1710-1714,共5页 Journal of Harbin Engineering University
基金 国家自然科学基金项目(61772226,61373051,61862056).
关键词 AP算法 Spark平台 PPI网络 蛋白质复合物 F值评价 ECC和CD加权 并行计算 AP algorithm Spark platform PPI network protein complex F-Measure evaluation ECC and CD weighting parallel computing
  • 相关文献

参考文献4

二级参考文献14

共引文献957

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部