摘要
频繁子图挖掘是许多实际应用领域中需要解决的重要问题,由于计算密集性、挖掘的图集及其结果容量大,现有的频繁子图挖掘方案无法满足时间需求,其处理效率是目前面临的主要挑战。原创性地提出了并行加速的频繁子图挖掘工具cmFSM。cmFSM主要在3个层次上进行并行优化:单节点上的细粒度OpenMP并行化、多节点多进程并行化和CPU-MIC协作并行化。在单节点上cmFSM的处理速度比基于CPU的最佳算法快一倍,在多节点方案中cmFSM提供可扩展性。结果表明,即使只使用一些并行计算资源,cmFSM也明显优于现有的最先进的算法。这充分表明提出的工具在生物信息学领域的有效性。
Frequent subgraph mining is an important issue to be solved in many practical fields. Due to the computational intensiveness, the mining of the atlas and the large capacity of the results, the existing solutions can not meet the time requirements, and its efficiency is currently the main challenge. The frequent subgraph mining tool cmFSM for parallel acceleration was originally proposed. cmFSM performs parallel optimization on three levels: fine-grained OpenMP parallelization on a single node, multi-node multi-process parallelization and CPU-MIC collaborative parallelization. cmFSM is twice as fast as the best CPU-based algorithm on a single node and provides scalability in a multi-node approach. In the future, we will continue to improve the scalability of multiple solutions.The results show that even with only a few parallel computing resources, cmFSM is significantly better than the most advanced algorithms available. This fully demonstrates the effectiveness of the proposed tool in the field of bioinformatics.
作者
彭绍亮
牛琦
李肯立
邹权
PENG Shaoliang;NIU Qi;LI Kenli;ZOU Quan(College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China;Institute of Fun dame ntal and Fron tier Sciences, Un iversity of Electronic Science and Tech no logy of China,Chengdu 610054, China)
出处
《大数据》
2019年第2期89-103,共15页
Big Data Research
基金
国家重点研发计划基金资助项目(No.2017YFB0202602
No.2018YFC0910405
No.2017YFC1311003
No.2016YFC1302500
No.2016YFB0200400
No.2017YFB0202104)
国家自然科学基金资助项目(No.61772543
No.U1435222
No.61625202
No.61272056)~~
关键词
频繁子图挖掘
生物信息学
并行算法
内存约束
同构
集成众核
frequent subgraph mining
bioinformatics
parallel algorithm
memory constraints
isomorphism
many integrated core