摘要
序列模式挖掘是数据挖掘领域的研究课题之一,针对传统算法对处理大数据普遍存在扩展性问题。为了改进扩展性,本文提出云模式下基于MapReduce的序列模式挖掘算法(SPAMC),本文设计出迭代MapReduce框架来高效生成候选模式,并在构建词法序列树时进行修剪。该框架不仅将树结构的子任务分给并行排列的独立映射机,而且能实现对支持计数的并行处理。选用32台虚拟机构建云环境,对多达1300万个交易序列进行了全面实验,实验结果表明SPAMC可大大缩短大数据的挖掘时间,达到极高的可扩展性,并提供云聚集的理想负载平衡。
Sequential pattern mining is one of the research topics in the field of data mining. In order to improve scalability. In this paper,cloud model based on MapReduce sequential pattern mining algorithm SPAMC. In this paper,the design of the iterative MapReduce framework to efficient generation of candidate patterns,and in constructing lexical sequence tree pruning. This framework not only can divide the sub tasks of the tree structure to the parallel array of independent mapping machines,but also can realize the parallel processing of the support count. Selected 32 virtual build cloud environments,up to 1300 million transactions in sequence comprehensive experiment. The experimental results show that SPAMC can greatly shorten the data mining time,achieves high scalability,and provides cloud gathered the ideal load balancing.
出处
《科技通报》
2018年第1期212-217,244,共7页
Bulletin of Science and Technology
基金
贵州省科学技术基金(黔科合LH字[2014]7216号),“运用可信计算技术对大数据系统架构中IaaS部件的改进”