期刊文献+

利用MapReduce平台实现高效并行的频繁子图挖掘 被引量:4

Using MapReduce Platform to Achieve Efficient Parallel Mining of Frequent Subgraphs
下载PDF
导出
摘要 频繁子图挖掘是数据挖掘领域的一个重要问题,并且有着广泛的应用。在Hadoop平台上实现了一种基于MapReduce的高效频繁子图挖掘算法Cloud-GFSG(cloud-global frequent subgraph)。该算法基于Apriori思想,在扩展边生成新的子图时,使用已经挖掘出的k-1阶的频繁子图生成k阶的频繁子图。同时,检查是否存在待扩展生成的子图,设定生成的频繁子图表示规则,保证了频繁子图信息的唯一性。较同类算法相比,该算法在挖掘频繁子图时更具通用性,并且在扩展边时避免产生大量的复制图,从而使得算法的正确性得以保证,且运行效率显著提高。 Frequent subgraph mining is an important problem in data mining domain and has been used widely. This paper proposes an efficient algorithm Cloud-GFSG (cloud-global frequent subgraph), by using MapReduce on Hadoop platform for mining frequent subgraphs. The algorithm is based on the principle of Apriori. It uses the discovered frequent subgraphs whose support is k-1 to generate the candidate frequent subgraphs whose support is k when it generates new subgraphs by extending edge. Meanwhile, it checks whether there exists any subgraph which would be gener-ated and sets the frequent subgraph generation rules to ensure the uniqueness of the frequent subgraphs. Compared with the state-of-the-art algorithms, the proposed algorithm has more general function and can avoid generating replicate graphs while extending a new edge. Therefore, its correctness can be ensured and the efficiency had been improved significantly.
出处 《计算机科学与探索》 CSCD 2014年第7期790-801,共12页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金(61173093 61202182) 陕西省自然科学基金(2013JM8019 2014JQ8359) 中国博士后科学基金(2012M521776) 中央高校基本科研业务费专项资金(K50510230001 BDY10)~~
关键词 频繁子图挖掘 MAPREDUCE HADOOP平台 frequent subgraph mining MapReduce Hadoop platform
  • 相关文献

参考文献19

  • 1Dehaspe L, Toivonen H, King R. Finding frequent substruc- tures in chemical compounds[C]//Proceedings of the 4th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '04), Seattle, USA, Aug 22-25, 2004. New York, NY, USA: ACM, 2004: 30-36.
  • 2Fatta G, Berthold M. Dynamic load balancing for the distrib- uted mining of molecular structures[J]. IEEE Transactions on Parallel and Distributed System, 2006, 17(8): 773-785.
  • 3Wang Ke, Liu Huiqing. Discovering typical structures of documents: a road map approach[C]//Proceedings of the 21st Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '98), Melbourne, Australia, Aug 24-28, 1998. New York, NY, USA: ACM, 1998: 146-154.
  • 4Kriegel H, Schonauer S. Similarity search in structured data[C]// Proceedings of the 5th International Conference on Data Ware- housing and Knowledge Discovery (DaWak '03), Prague, Czech Republic, 2003. Berlin: Springer-Verlag, 2003: 309-319.
  • 5Fischer A, Riesen K, Bunke H. An experimental study of graph classification using prototype selection[C]//Procee- dings of the 19th International Conference on Pattern Rec- ognition (ICPR '08), Florida, USA, 2008. Washington, DC, USA: IEEE Computer Society, 2008: 1-4.
  • 6Huang Jianbin, Sun Heli, Han Jiawei, et al. SHR/NK: a struc- tural clustering algorithm for detecting hierarchical commu- nities in networks[C]//Proceedings of the 19th ACM Inter-national Conference on Information and Knowledge Manage- ment (CIKM '10), Toronto, Canada, Oct 26-30, 2010. New York, NY, USA: ACM, 2010: 219-228.
  • 7Huang Jianbin, Sun Heli, Song Qinbao, et al. Revealing density-based clustering from the core-connected tree of a network[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(8): 1876-1889.
  • 8Yan Xifeng, Yu P, Han Jiawei. Graph indexing: a frequent structure-based approach[C]//Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data (SIGMOD '04), Baltimore, USA, Jul 27-29, 2004. New York, NY, USA: ACM, 2004: 335-346.
  • 9Williams D W, Huan Jun, Wang Wei. Graph database indexing using structured graph decomposition[C]//Proceedings of the 23rd IEEE International Conference on Data Engineering (ICDE '07), Istanbul, Turkey, Apr 15-20, 2007. Washington, DC, USA: IEEE Computer Society, 2007: 231-235.
  • 10Hill S, Srichandan B, Sunderraman R. An iterative Map- Reduce approach to frequent subgraph mining in biological datasets[C]//Proceedings of the 3rd ACM Conference on Bioinformatics, Computational Biology and Biomedicine (BCB '12), Orlando, USA, Oct 7-10, 2012. New York, NY, USA: ACM, 2012: 661-666.

同被引文献36

  • 1ANCHURI P, ZAKI M J, BARKOL O, et al.Approximate graph mining with label costs[C].Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining.ACM, 2013 : 518-526.
  • 2KANG U,AKOGLU L, CHAU D H P.Big graph mining: algorithms, anomaly detection, and applications[J].Proceedings of the ACM ASONAM, 2013,13 : 25-28.
  • 3ZHU F ,QU Q ,LO D, et al.Mining top-k large structural patterus in a massive network[J].Proceedings of the VLDBEndowment, 2011,4( 11 ) : 807-818.
  • 4AKOGLU L, CHAU D H, KANG U, et al.Opavion : mining and visualization in large graphs[C].Proeeedings of the 2012 ACM SIGMOD International Conference on Management of Data.ACM, 2012 : 717-720.
  • 5SARMA A D, AFRATI F N, SALIHOGLU S, et al. Upper and lower hounds on the east of a map-reduce e:mputa- tion[C].Proceedings of the VLDB Endowment. VLDB Endowment, 2013,6(4) : 277-288.
  • 6BORGELT C, MEINL T, BERTHOLD M. Moss : a program for molecular substructure mining[C].Proeeedings of the 1st international workshop on open source data mining:frequent pattern mining implementations. ACM, 2005 : 6-15.
  • 7BORGELT C.Canonical forms for frequent graph mining[M] Advances in Data Analysis, Springer Berlin Heidelberg, 2007 : 337-349.
  • 8LESKOVEC J.Stanford large network dataset collection[J]. URL http ://snap.stanford.edu/data/index.html, 2011.
  • 9Anchuri P, Zaki M J, Barkol O, et al. Approximate graph mining with label costs[C]. Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2013: 518-526.
  • 10Kang U, Akoglu L, Chau D H P. Big Graph Mining: Algorithms, Anomaly Detection, and Applications [J].Proceedings of the ACM ASONAM, 2013, 13: 25-28.

引证文献4

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部