摘要
Spark作为当今大数据领域的分布式处理框架,在各个领域的应用越来越广泛。在关键蛋白质预测中,基于蛋白质相互作用网络拓扑结构的介数中心(BC)指标有着很好的预测效果,提出一种新的L_1-BC指标,不仅能区分一些BC指标值相同的蛋白质,还能通过取子图计算体现出蛋白质的局部特性,实验结果表明该指标能够提高关键蛋白质的预测精度。基于Spark平台实现了L_1-BC指标的并行计算算法,通过累加器和广播变量使得内存得到极大的优化,在数据集YDIP上的实验结果表明,基于Spark的L_1-BC算法的加速比达到了94.31%。
Spark is widely used in various fields as a distributed processing framework in big data field. For the key protein prediction in Protein-Protein Interaction (PPI) networks, the Betweermess Centrality (BC) which is based on the topology properties of PPI networks has a good prediction effect. In this paper, a new index called L1-BC is presented, which not only distinguishes some proteins with the same values of BC index, but also reflects the local properties of proteins. The experimental results show that the index L1-BC can improve the accuracy of key proteins prediction. In addition, the parallel computing algorithm of L1-BC is implemented based on Spark platform, in which the memory is greatly optimized by utilizing accumulator and broadcast variable. The results of acceleration ratio experiment on the dataset YDIP show that the optimized L1-BC algorithm on the Spark can reach the acceleration ratio of 94.31%.
作者
胡德祺
孙永奇
秦朝
HU Deqi;SUN Yongqi;QIN Chao(School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China)
出处
《计算机工程与应用》
CSCD
北大核心
2018年第24期234-240,共7页
Computer Engineering and Applications
基金
国家自然科学基金(No.61572005
No.61672086
No.61272004)