自适应云端的大规模导出子图提取算法被引量：7

Large Scale Induced Subgraphs Mining Algorithm on Self Adaptive Cloud

下载PDF

导出

摘要针对现有云计算平台资源随机调配与传统导出子图挖掘效率较低等问题,进一步提升云计算平台中资源整合利用效率与大规模导出子图挖掘效率,提出了一种自适应云端的大规模导出子图提取算法,以解决资源优化利用与海量图挖掘等问题。首先介绍了云计算概念与导出子图挖掘相关概念以及问题描述;接着根据MapReduce并行处理模型设计了一种自适应任务动态分配算法SAC_TA(Self Adaptive Cloud Dynamic Allocation),它根据计算任务自适用分配系统资源以达到成本消耗的最优;并设计出自适应云端框架,然后基于自适应云端提出了大规模导出子图挖掘算法SFGFF(SAC_TA、Find_VE、G_F1、FindPartFG、FindAllFG),它共分为4个阶段的挖掘,将所有算法应用到自适应云端中可构成整个导出子图挖掘体系;最后在人工模拟数据与真实环境数据下进行了试验,结果表明,自适应云端运行良好,算法有效可行,具有较高的加速比与运行效率,能有效满足大规模频繁导出子图挖掘的需求。 Aiming at the current puzzles of random resource allocation of cloud computing platform and lower mining efficiency of traditional induced subgraph,promoting the efficiency of resource integration and using of cloud computing platform and large-scale induced subgraph mining,the paper put forward an algorithm of large-scale induced subgraph extraction for self-adaption cloud to solve the problems of resource optimal utilization and massive graph mining.The paper firstly introduced the relevant concepts and problem description of cloud computing and induced subgraph mining,then designed an algorithm SAC_TA of self-adaption task dynamic allocation according to MapReduce parallel processing model,which can comput task self-adaption allocation system resources to reach the optimum of cost wasting,meanwhile designed the self-adaption cloud framework.On the basis of the framework,the paper put forward the massive induced subgraph mining algorithm SFGFF,which includs four stages of mining.And while applying all the algorithms to self-adaption cloud,the whole induced subgraph mining system can be constructed.The experimental result of manual simulation data and real environment data shows that the self-adaption cloud runs well and the algorithms are efficient and feasible,and have higher speed-up ratio and operating efficiency to satisfy the demand of massive frequent induced subgraph mining.

作者郭鑫董坚峰周清平

机构地区吉首大学软件服务外包学院武汉大学信息资源研究中心

出处《计算机科学》 CSCD 北大核心 2014年第6期155-160,198,共7页 Computer Science

基金湖南省工业支撑计划重点项目(2012GK2006) 湖南省教育厅科学研究项目(12C0291 11C1051) 生态旅游湖南省重点实验室开放基金项目(JDSTLY201206) 湖南省图书馆学会2013-2014年度重点课题(XHZD1007)资助

关键词大数据数据挖掘云计算导出子图子图同构 Big data Data mining Cloud computing Induced subgraph Subgraph isomorphism

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献19

1覃雄派,王会举,杜小勇,王珊.大数据分析——RDBMS与MapReduce的竞争与共生[J].软件学报,2012,23(1):32-45. 被引量：386
2TDWI Checklist Report:Big Data Analytics[OL].http://tdwi.org/research/2010/08 Big-Data-Analytics.aspx.
3邹兆年,李建中,高宏,张硕.从不确定图中挖掘频繁子图模式[J].软件学报,2009,20(11):2965-2976. 被引量：32
4Zou Xiao-hong,Chen Xiao,Guo Jing-feng,et al.An improved algorithm for mining Close Graph[J].ICIC Express Letters Journal of Research and Surveys,2010,4(4):1135-1140.
5薛冰,张俊峰,郑超.基于分割图集的频繁闭图挖掘算法[J].计算机应用研究,2011,28(1):61-64. 被引量：3
6Guo Jing-feng,Chai Ran,Li Jia.Top-down algorithm for mining maximal frequent subgraph[J].Advanced Research on Industry,Information System and Materials Engineering,2011,204-210:1472-1476.
7刘勇,李建中,高宏.从图数据库中挖掘频繁跳跃模式[J].软件学报,2010,21(10):2477-2493. 被引量：10
8Gupta S,Raman V,Saurabh S.Maximum r-Regular Induced Subgraph Problem:Fast Exponential Algorithms and Combinatorial Bounds[J].SIAM Journal on Discrete Mathematics,2012,26(4):1758-1780.
9Lenk A,Klems M,Nimis J,et al.What's inside the cloud? An Architectural Map of the Cloud Landscape[C]//Proceedings of the 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing.2009:23-31.
10Son J,Choi H,Chung Y D.Skew-tolerant key distribution for load balancing in MapReduce[J].IEICE Transaction on Information and Systems,2012,95 (2):677-680.

二级参考文献118

1卢云彬,曹汉强.基于Hash表的关联规则挖掘算法的改进[J].计算机技术与发展,2007,17(6):12-14. 被引量：10
2Inokuchi A,Washio T,Motoda H.An apriori-based algorithm for mining frequent substructures from graph data.In:Cheng M,Yu PS,Liu B,eds.Proc.of the 4th European Conf.on Principles of Data Mining and Knowledge Discovery.Lyon:Springer-Verlag,2000.13-23.
3Kuramochi M,Karypis G.Frequent subgraph discovery.In:Cercone N,Lin TY,Wu X,eds.Proc.of the 1st IEEE Int'l Conf.on Data Mining.San Jose:IEEE Computer Society,2001.313-320.
4Yan X,Han J.gSpan:Graph-Based substructure pattern mining.In:Aggrawal R,Dittrich K,Ngu AH,eds.Proc.of the 2nd IEEE Int'l Conf.on Data Mining.Maebashi:IEEE Computer Society,2002.721-724.
5Borgelt C,Berhold MR.Mining molecular fragments:Finding relevant substructures of molecules.In:Aggrawal R,Dittrich K,Ngu AH,eds.Proc.of the 2nd IEEE Int'l Conf.on Data Mining.Maebashi:IEEE Computer Society,2002.51-58.
6Huan J,Wang W,Prins J.Efficient mining of frequent subgraphs in the presence of isomorphism.In:Wu X,Tuzhilin A,eds.Proc.of the 3rd IEEE Int'l Conf.Data Mining.Melbourne:IEEE Computer Society,2003.549-552.
7Nijssen S,Kok JN.A quickstart in frequent structure mining can make a difference.In:Kim W,Kohavi R,Gehrke J,DuMouchel W,eds.Proc.of the 10th ACM SIGKDD Int'l Conf.on Knowledge Discovery and Data Mining.Seattle:ACM,2004.647-652.
8Yan X,Han J.Closegraph:Mining closed frequent graph patterns.In:Getoor L,Senator TE,Domingos P,Faloutsos C,eds.Proc.of the 9th ACM SIGKDD Int'l Conf.on Knowledge Discovery and Data Mining.Washington:ACM,2003.286-295.
9Huan J,Wang W,Prins J,Yang J.Spin:Mining maximal frequent subgraphs from graph databases.In:Kim W,Kohavi R,Gehrke J,DuMouchel W,eds.Proc.of the 10th ACM SIGKDD Int'l Conf.on Knowledge Discovery and Data Mining.Seattle:ACM,2004.581-586.
10Thomas LT,Valluri SR,Karlapalem K.Margin:Maximal frequent subgraph mining.In:Proc.of the 6th IEEE Int'l Conf.Data Mining.Hong Kong:IEEE Computer Society,2006.1097-1101.http://www.computer.org/portal/web/csdl/doi/10.1109/ ICDM.2006.102.

共引文献425

1刘维,陈崚.复杂网络中的链接预测[J].信息与控制,2020,49(1):1-23. 被引量：2
2郑智泉,杨楠.智能革命下数据驱动的智慧图书馆建设分析[J].智能计算机与应用,2020(8):183-185.
3谢月锋,董现垒,陈卉,王燕,刘志成.利用网络痕迹信息即时预测儿童腹泻流行趋势[J].医学信息（医学与计算机应用）,2016,29(29):1-4.
4胡海洋,刘占晨,胡华.科学工作流中面向不确定数据源图的受限可达查询[J].计算机研究与发展,2013,50(S1):133-144.
5董新华,李瑞轩,周湾湾,王聪,薛正元,廖东杰.Hadoop系统性能优化与功能增强综述[J].计算机研究与发展,2013,50(S2):1-15. 被引量：69
6邓波,张玉超,金松昌,林旺群.基于MapReduce并行架构的大数据社会网络社团挖掘方法[J].计算机研究与发展,2013,50(S2):187-195. 被引量：10
7马宾.一种改进的并行K_近邻网络舆情分类算法研究[J].微电子学与计算机,2015,32(6):62-66. 被引量：1
8袁野,王国仁.面向不确定图的概率可达查询[J].计算机学报,2010,33(8):1378-1386. 被引量：11
9袁野,王国仁.基于阈值的概率图可达查询[J].计算机学报,2010,33(12):2219-2228. 被引量：3
10王国仁,袁野,张佳希.面向不确定平面图的模式匹配查询[J].计算机应用,2011,31(4):874-881.

同被引文献71

1ANCHURI P, ZAKI M J, BARKOL O, et al.Approximate graph mining with label costs[C].Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining.ACM, 2013 : 518-526.
2KANG U,AKOGLU L, CHAU D H P.Big graph mining: algorithms, anomaly detection, and applications[J].Proceedings of the ACM ASONAM, 2013,13 : 25-28.
3ZHU F ,QU Q ,LO D, et al.Mining top-k large structural patterus in a massive network[J].Proceedings of the VLDBEndowment, 2011,4( 11 ) : 807-818.
4AKOGLU L, CHAU D H, KANG U, et al.Opavion : mining and visualization in large graphs[C].Proeeedings of the 2012 ACM SIGMOD International Conference on Management of Data.ACM, 2012 : 717-720.
5SARMA A D, AFRATI F N, SALIHOGLU S, et al. Upper and lower hounds on the east of a map-reduce e:mputa- tion[C].Proceedings of the VLDB Endowment. VLDB Endowment, 2013,6(4) : 277-288.
6BORGELT C, MEINL T, BERTHOLD M. Moss : a program for molecular substructure mining[C].Proeeedings of the 1st international workshop on open source data mining:frequent pattern mining implementations. ACM, 2005 : 6-15.
7BORGELT C.Canonical forms for frequent graph mining[M] Advances in Data Analysis, Springer Berlin Heidelberg, 2007 : 337-349.
8LESKOVEC J.Stanford large network dataset collection[J]. URL http ://snap.stanford.edu/data/index.html, 2011.
9Anchuri P, Zaki M J, Barkol O, et al. Approximate graph mining with label costs[C]. Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2013: 518-526.
10Kang U, Akoglu L, Chau D H P. Big Graph Mining: Algorithms, Anomaly Detection, and Applications [J].Proceedings of the ACM ASONAM, 2013, 13: 25-28.

引证文献7

1田园.云计算环境下的电力信息系统数据上云技术研究[J].自动化与仪器仪表,2018,0(12):75-79. 被引量：3
2张晓蕾,马晓丽.支腿自动翻转式整体提升电梯井操作平台施工技术[J].电子技术应用,2015,41(9):95-98. 被引量：1
3刘莹,杜奕智,邹乐.大图挖掘中一种基于云计算的改进SpiderMine算法[J].微型电脑应用,2016,32(1):33-37. 被引量：1
4林媛.非结构化网络中有价值信息数据挖掘研究[J].计算机仿真,2017,34(2):414-417. 被引量：22
5陈小莉.基于大数据的计算机数据挖掘技术在档案管理系统中的研究应用[J].激光杂志,2017,38(2):142-145. 被引量：49
6刘全飞,彭凌云.云计算平台下恶意软件动态自适应自主防护算法设计[J].科学技术与工程,2017,17(31):283-288. 被引量：1
7包永红.云计算技术下数据挖掘平台设计及技术[J].现代电子技术,2016,39(16):61-63. 被引量：9

二级引证文献85

1赖倩.基于大数据技术的档案数据挖掘对策研究[J].企业改革与管理,2021(9):19-20. 被引量：2
2葛仁燕.大数据视角下基建档案管理的信息化路径[J].现代企业文化,2020,0(1):91-91.
3唐仁泉.小信号自动切换系统的技术改造[J].电视工程,2000(1):9-10.
4李亚梅.计算机数据挖掘技术开发及其在档案信息管理中的运用研究[J].中国新通信,2018,20(24):41-41. 被引量：10
5邵晶.大数据在新媒体传播中的运用[J].新闻传播,2018(23):66-67. 被引量：3
6李天峰.智能信息处理技术在网络计算中的应用[J].现代电子技术,2017,40(15):41-43. 被引量：6
7周露,黄晔凯.基于Citespace的大数据时代档案热点研究[J].无线互联科技,2017,14(15):99-101. 被引量：2
8张杰.大数据思维下的档案管理研究[J].情感读本,2017,0(23):24-25.
9孟祥富.大数据技术在计算机信息系统中的应用研究[J].办公室业务,2017(24):190-190. 被引量：12
10陈超泉,张鑫鑫,王政锋.关于网络环境中用户服务信息优化检测仿真[J].计算机仿真,2017,34(12):309-312.

1张宗郁,张亚平,张静远,张晓君.改进关联规则算法在高校教学管理中的应用[J].计算机工程,2012,38(2):75-77. 被引量：9
2郑瑞利.多媒体数据库的管理与数据挖掘研究[J].信息系统工程,2016,29(5):64-64.
3谢华.关联规则挖掘下的序列模式再挖掘[J].军事通信技术,2005,26(S1):67-70.
4涂智明,许海波,许剑东,李珊,郭晗.基于3DMax和VRP的虚拟吊车仿真系统[J].武汉冶金管理干部学院学报,2014,24(1):70-72. 被引量：2
5杨帅,郑有才.一种基于遗传算法的资源优化利用路由算法[J].计算机应用,2003,23(z2):20-21. 被引量：2
6陈燕,宋玲,李陶深.基于遗传算法的网络负载均衡的选播路由算法[J].计算机工程,2005,31(8):93-95. 被引量：12
7吴艳辉,陈建二,王伟平.一种适于网络资源优化利用的QoS路由算法[J].计算机工程与应用,2004,40(32):128-129.
8肖笛.基于C51单片机的热水器节能研究[J].硅谷,2011,4(5):71-71. 被引量：1
9Sandi KLAVZAR,Kishori P. NARAYANKAR,H. B. WALIKAR.Almost Self-Centered Graphs[J].Acta Mathematica Sinica,English Series,2011,27(12):2343-2350. 被引量：3
10郑坤.一种实时进程调度算法的研究[J].电脑知识与技术,2014,0(11):7310-7312.

计算机科学

2014年第6期

浏览历史

内容加载中请稍等...

自适应云端的大规模导出子图提取算法被引量：7

参考文献19

二级参考文献118

共引文献425

同被引文献71

引证文献7

二级引证文献85

相关作者

相关机构

相关主题

浏览历史

自适应云端的大规模导出子图提取算法 被引量：7

参考文献19

二级参考文献118

共引文献425

同被引文献71

引证文献7

二级引证文献85

相关作者

相关机构

相关主题

浏览历史

自适应云端的大规模导出子图提取算法被引量：7