期刊文献+

基于Map Reduce的序列模式挖掘算法 被引量:2

Sequential Pattern Mining Algorithm Based on Map Reduce
下载PDF
导出
摘要 传统数据挖掘算法在处理海量数据集时计算能力有限。为解决该问题,提出一种基于Map Reduce的分布式序列模式挖掘算法MR PrefixSpan。在PrefixSpan算法的基础上,对模式挖掘任务进行分割,利用Map函数处理由不同前缀得到的序列模式,并行构造投影数据库,从而提高挖掘效率及简化搜索空间。采用Reduce函数对中间结果进行规约,得到全局序列模式。在Hadoop集群上的实验结果表明,MR PrefixSpan能减少数据库扫描时间,具有较高的并行加速比和较好的可扩展性。 Traditional data mining algorithm has computing power shortage in dealing with mass data set.Aiming at the problem,a distributed sequential pattern mining algorithm based on Map Reduce programming model named MR PrefixSpan is proposed.Mining tasks are decomposed to many,the Map function is used to mine each Prefix projected sequential pattern,and the projected databases are constructed parallelly.It simplifies the search space and acquires a higher mining efficiency.Then the intermediate values are passed to a Reduce function which merges together all these values to produce a possibly smaller set of values.Experimental results on Hadoop cluster show that MR PrefixSpan can reduce the time of scanning data base,has higher parallel speed up ratio and better expansibility.
出处 《计算机工程》 CAS CSCD 2012年第15期43-45,共3页 Computer Engineering
基金 国家自然科学基金资助项目(60873247) 山东省自然科学基金资助项目(ZR2009GZ007)
关键词 云计算 并行处理 MAP Reduce模型 PREFIXSPAN算法 序列模式 HADOOP平台 cloud computing parallel processing Map Reduce model PrefixSpan algorithm sequential pattern Hadoop platform
  • 相关文献

参考文献7

  • 1Michael M. Cloud Computing: Web-based Applications that Change the Way You Work and Collaborate Online[M]. [S. 1.]: SAMS Press, 2009.
  • 2田卫东,姜海辉.一种有效的并行序列模式挖掘算法[J].计算机工程,2009,35(18):59-61. 被引量:1
  • 3郑欣杰,朱程荣,熊齐邦.基于MapReduce的分布式光线跟踪的设计与实现[J].计算机工程,2007,33(22):83-85. 被引量:7
  • 4Apache Hadoop. Hadoop[EB/OL]. (2011-02-15). http://hadoop. apache.org.
  • 5Pei Jian, Han Jiawei. Mining Sequential Patterns by Pattern- Growth: The Prefixspan Approach[J]. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(11): 1424-1440.
  • 6Dean J. Experiences with Map Reduce: An abstraction for Large- scale Computation[C]//Proc. of the 15th International Conference on Parallel Architectures and Compilation Techniques. [S. 1.]: IEEE Press, 2006.
  • 7HANJ KAMBERM 范明 孟小峰译.数据挖掘概念与技术[M].北京:机械工业出版社,2001..

二级参考文献9

  • 1邹翔,张巍,刘洋,蔡庆生.分布式序列模式发现算法的研究[J].软件学报,2005,16(7):1262-1269. 被引量:19
  • 2Agrawal R, Srikant R. Mining Sequential Patterns[C]//Proc. of the 11th Int'l Conf. on Data Engineering. Los Alamitos, CA, USA: IEEE Computer Society Press, 1995: 3-14.
  • 3Pei Jian, Han Jiawei, Mortazavi A B, et al. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-projected Pattern Growth[C]//Proc. of the 17th IEEE Int'l Conf. on Data Engineering. Los Alamitos, CA, USA: IEEE Computer Society Press, 2001: 215-224.
  • 4Guralnik V, Garg N, Karypis G. Parallel Tree Projection Algorithm for Sequence Mining[C]//Proc. of the 7th International Euro-Par Conference Manchester on Parallel Processing. London, UK: Springer-Verlag, 2001 : 310-320.
  • 5Cong Shengnan, Han Jiawei, Padua D. Parallel Mining of Closed Sequential Patterns[C]//Proc. of the 2005 International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2005: 562-567.
  • 6Dean J,Ghemawat S.MapReduce:Simplifed Data Processing on Large Clusters[C]//Proc.of the 6th Symposium on Operating System Design and Implementation,San Francisco.2004.
  • 7Cutting D.Scalable Computing with MapReduce[C]//Proc.of O'Reilly Open Source Convention,Poland.2005.
  • 8The Raja Project[EB/OL].(2003-05).http://raja.sourceforge.net.
  • 9Apache Lucene Hadoop[EB/OL].(2006-11).http://lucene.apache.org/hadoop.

共引文献50

同被引文献13

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部