期刊文献+

基于MapReduce模型可扩展的序列模式挖掘的研究 被引量:2

Research on Scalable Sequential Pattern Mining Based on MapReduce Model
下载PDF
导出
摘要 序列模式挖掘是数据挖掘领域的研究课题之一,针对传统算法对处理大数据普遍存在扩展性问题。为了改进扩展性,本文提出云模式下基于MapReduce的序列模式挖掘算法(SPAMC),本文设计出迭代MapReduce框架来高效生成候选模式,并在构建词法序列树时进行修剪。该框架不仅将树结构的子任务分给并行排列的独立映射机,而且能实现对支持计数的并行处理。选用32台虚拟机构建云环境,对多达1300万个交易序列进行了全面实验,实验结果表明SPAMC可大大缩短大数据的挖掘时间,达到极高的可扩展性,并提供云聚集的理想负载平衡。 Sequential pattern mining is one of the research topics in the field of data mining. In order to improve scalability. In this paper,cloud model based on MapReduce sequential pattern mining algorithm SPAMC. In this paper,the design of the iterative MapReduce framework to efficient generation of candidate patterns,and in constructing lexical sequence tree pruning. This framework not only can divide the sub tasks of the tree structure to the parallel array of independent mapping machines,but also can realize the parallel processing of the support count. Selected 32 virtual build cloud environments,up to 1300 million transactions in sequence comprehensive experiment. The experimental results show that SPAMC can greatly shorten the data mining time,achieves high scalability,and provides cloud gathered the ideal load balancing.
作者 朱林
出处 《科技通报》 2018年第1期212-217,244,共7页 Bulletin of Science and Technology
基金 贵州省科学技术基金(黔科合LH字[2014]7216号),“运用可信计算技术对大数据系统架构中IaaS部件的改进”
关键词 序列模式挖掘 大数据 云计算 MAPREDUCE框架 sequential pattern mining big data cloud computing MapReduce framework
  • 相关文献

参考文献4

二级参考文献207

  • 1梅立军,周强,臧路,陈祖舜.知网与同义词词林的信息融合研究[J].中文信息学报,2005,19(1):63-70. 被引量:28
  • 2董振东,董强,郝长伶.知网的理论发现[J].中文信息学报,2007,21(4):3-9. 被引量:99
  • 3Dean J, Ghemawat S. MapReduce: Simplified dala processing on large clusters//Proceedings of the Conference on Operating System Design and Implementation(OSDU04,). San Francisco, USA, 2004: 137-150.
  • 4Thusoo A, Sarma J S, JainN, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R. Hive: A warehousing solution over a map-reduce framework//Proceedings of the Conference on Very Large Databases (VLDB' 09). Lyon, France, 2009:1626-1629.
  • 5Olston C, Reed B, Srivastava U, Kumar R, Tomkins A. Pig Latin: A not-so-foreign language for data processing//Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD' 08). Vancouver, BC, Canada, 2008:1099 1110.
  • 6Bu Y, Howe B, Balazinska M, Ernst M D. HaLoop.. Efficient iterative data processing on large clusters//Proceedings of the Conference on Very Large Databases (VLDB' 10). Sin gapore, 2010:285-296.
  • 7Ekanayake J, Li H, Zhang B, Gunarathne T, Bae S-H, Qiu J, Fox G. Twister: A runtime for iterative MapReduce// Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. Chicago, Illinois, USA, 2010:810-818.
  • 8Wilson G V. Practical Parallel Programming. Cambridge, MA.. MIT Press, 1995.
  • 9Valiant L G. A bridging model for parallel computation. Communications of the ACM, 1990, 33(8): 103-111.
  • 10Dean J, Ghemawat S. MapReduce: A flexible data processing tool. Communications of the ACM, 2010, 53(1): 72-77.

共引文献1467

同被引文献16

引证文献2

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部