期刊文献+

面向大数据的图模式挖掘概率算法 被引量:3

Graph pattern mining probability algorithm for big data
下载PDF
导出
摘要 在当今大数据时代,MapReduce等大数据处理框架处理数据能力有限,其在处理有关图的数据时常常显得缓慢低效,典型如3-clique计数问题,故需要探究一种高效的算法处理这类clique计数问题。由于在前人文献中对3-clique计数问题已有深入探讨,故针对该问题的扩展版本(4-clique计数问题)进行探究。在一个启发式的想法下提出了基于邻边采样的概率采样算法,利用切诺夫不等式证明该算法在近似条件下只需要一定数量的采样器作为相对误差的性能保证。通过实验评估对比发现,相对于传统精确算法,概率采样算法虽然在结果上损失了少量的精度,但在算法运行时间和空间占用上具有巨大的优势。最后得出其在实际应用中具有巨大实践价值的结论。 In today’s big data era,big data processing frameworks such as MapReduce often appear slow and inefficient when processing data,specially related to graphs.Therefore,it is necessary to explore an efficient algorithm to handle this type of clique counting problem.Since the predecessor literatures have thoroughly explored the 3-clique counting,the extended version of the problem(the 4-clique counting problem)improves its position gradually.Under the guidance of a heuristic idea,this paper proposed a probability sampling algorithm based on neighboring edge sampling to solve the extended problem.With the usage of Chernoff inequality,the algorithm only needed a certain number of samplers as the performance guarantee of relative error under the approximate condition.Later,the experimental evaluation and comparison shows that the probability sampling algorithm loses a small amount of precision compared with the traditional precision algorithm,but it has great advantages in algorithm running time and space occupation.Finally,it comes to the conclusion that it has great practical value in practical applications.
作者 姜丽丽 李叶飞 豆龙龙 陈智麒 钱柱中 Jiang Lili;Li Yefei;Dou Longlong;Chen Zhiqi;Qian Zhuzhong(Jiangsu Frontier Electric Technology Co.Ltd.,Nanjing 210000,China;Dept.of Computer Science&Technology,Nanjing University,Nanjing 210023,China)
出处 《计算机应用研究》 CSCD 北大核心 2020年第12期3545-3551,共7页 Application Research of Computers
基金 国家自然科学基金面上项目(61872175) 江苏省自然科学基金面上项目(BK20181252)。
关键词 4-clique计数问题 概率化算法 图模式挖掘 大数据处理 近似算法 4-clique counting problem probability algorithm graph pattern mining big data processing approximation algorithm
  • 相关文献

参考文献6

二级参考文献147

  • 1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 2Amazon SimpleDB. http://aws, amazon, com/simpledb/, 2011-8-10.
  • 3Connor Alexander G, Chrysanthis Panos K, Labrinidis Alexandros. Key key-value stores for efficiently processing graph data in the cloud//Proceedings of the GDM. Hannover, Germany, 2011:88-93.
  • 4Lordanov Borislav. HyperGraphDB: A generalized graph database//Proceedings of the IWGD. JiuZhai Valley, China, 2010:25-36.
  • 5Eifrem Emil. NOSQL: Scaling to size and scaling to complexity, http://blogs, neotechnology, com/emil/2009/11/ nosql-scaling tosize-and-scaling-to-complexity, html, 2009- 1-15.
  • 6Wu Sai, Jiang Da-Wei, Ooi Beng Chin et al. Efficient B-tree based indexing for cloud data proeessing//Proeeedings of the VLDB. Singapore, 2010: 1207-1218.
  • 7Wang Jin-Bao, Wu Sai, Gao Hong et al. Indexing multi dimensional data in a cloud system//Proceedings of the SIGMOD. Indianapolis, Indiana, USA, 2010: 591-602.
  • 8Tsatsanifos George, Sacharidis Dimitris, Sellis Timos et al. MIDAS: Multi-attribute indexing for distributed architecture systems//Proceedings of the SSTD. Minneapolis, MN, USA, 2011:168-185.
  • 9Aguilera M K, Golab W, Shah M A. A practical scalable distributed B-tree//Proceedings of the VLDB. Auckland, New Zealand, 2008: 598-609.
  • 10Zhang Xiang-Yu, Ai Jing, Wang Zhong-Yuan, Lu Jia-Heng et al. An efficient multi-dimensional index for cloud data management//Proceedings of the CloudDB. Hong Kong, China, 2009:17-24.

共引文献505

同被引文献30

引证文献3

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部