基于Map/Reduce集群上的模式空间划分的序列模式挖掘被引量：1

Sequential Pattern Mining Based on Pattern Space Division in Map/Reduce Cluster

下载PDF

导出

摘要通过模式空间划分将基于Map/Reduce处理数据集与候选序列模式集的多对多的对应关系的问题转化为处理数据集与以频繁1-序列为基的各子模式空间的多对多的对应关系问题,大大缩小了中间结果键值对集合的规模,避免了由于组合爆炸导致的单一Map节点的瓶颈问题.通过三轮的Map/Reduce任务,实现了模式空间和过滤规则的建立,并在此基础上实现了各子模式空间上独立地进行序列模式的挖掘.通过充分利用整个模式空间的全局特征及各子模式空间的个性特征,设计了优化的非递归挖掘算法,减少了前缀投影库构造次数及对构造的投影库的扫描次数,从而提高了挖掘阶段的效率. By means of pattern space division and based on Map/Reduce, the problem of processing the many-to-many corresponding relationship between the data set and the patterns set is converted to the problem of processing the many-to-many corresponding relationship between the data subsets and the pattern subspaces associated with the length-1 sequential patterns. Thus, the size of the intermediate key/value pairs set is reduced so dramatically that the problem of single Map node bottleneck which results from combinatorial explosion of candidate pattern space is avoided. Over three rounds of Map/Reduce tasks, the pattern space is constructed and divided, the filtering rules is set up and used, father more, the sequential pattern mining is realized in each pattern subspace independently. By making the best of both the universal trait of the whole pattern space and the individuality of pattern subspace, the optimized non-recursive algorithm is designed and implemented to improve the efficiency of mining phase by avoid unnecessary constructing of prefix projected databases and scanning of the constructed prefix projected databases.

作者刘骞陈明

机构地区中国石油大学计算机科学与技术系

出处《微电子学与计算机》 CSCD 北大核心 2012年第9期149-151,156,共4页 Microelectronics & Computer

关键词 Map/Reduce模式空间划分序列模式挖掘云计算 Map/Reduce pattern space division sequential pattern mining cloud computing

分类号 TP39 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献4

1Sanjay Ghemawat, Howard Gobioff, Shun-Tak Le- ung. The Google file system[C] // Proceedings of the 19th Symposium on Operating Systems Principles. Lake George, New York: IEEE, 2003:29-43.
2Dean Jeffrey, Ghemawat Sanjay. Map/reduce: simpli- fied data processing on large clusters[J]. Communica- tions of the ACM, 2008, 51(1) : 107-113.
3Google Inc. Protocol Buffers: Google's data inter- change forrnat[EB/OL]. (2010) [2012-02-05]. http: //eode. google, com/p/protobuf/Aceessed 26.01.10.
4McCreadie R. Map/Reduce indexing strategies: Stud- ying scalability and efficiency[J]. Information process- ing and Management (2011),2010.

同被引文献3

1Pei J,Han J,Mortazavi-Asl B,et al.Mining sequential patterns by pattern-growth:the PrefixSpan approach[J].IEEE Transactions on Knowledge and Data Engineering,2004,16(11):1424-1440.
2JEFFREYD,SANJAYG.Map/Reduce:simplified data processing on large clusters[J].Communications of the ACM,2008,51(1):107-113.
3袁和金.视频目标轨迹分析的改进PrefixSpan方法[J].计算机工程与应用,2011,47(32):7-10. 被引量：2

引证文献1

1彭茗菁,马传香,李伟亮.基于MAP/REDUCE的移动目标连续轨迹模式挖掘的研究[J].物联网技术,2014,4(10):59-60. 被引量：1

二级引证文献1

1高瑞,周彩兰,朱荣.移动轨迹挖掘算法设计与系统实现[J].现代电子技术,2017,40(1):134-136.

1刘骞,陈明.基于Map／Reduce集群上的模式空间划分的数据挖掘[J].中国电子商情（通信市场）,2012(3):91-95.
2刘骞,陈明.基于改进的Map/Reduce及模式空间划分的数据挖掘[J].微电子学与计算机,2011,28(8):140-142. 被引量：4
3邓方安,刘三阳,徐扬,杨磊.粗糙近似算子在模式的可能性和必然性分类中的应用[J].电子学报,2004,32(4):697-700.
4师鸣若.一种网络流量的序列模式挖掘方法[J].微计算机信息,2011,27(3):230-232.
5秦兆文,刘嘉勇.基于PrefixSpan的应用层协议特征串提取算法[J].信息安全与通信保密,2014,12(6):105-108. 被引量：1
6张巍,刘峰,滕少华.改进的PrefixSpan算法及其在序列模式挖掘中的应用[J].广东工业大学学报,2013,30(4):49-54. 被引量：11
7陈子军,李伟,李霞,王鑫昱.基于投影编码的频繁子树挖掘算法[J].计算机研究与发展,2006,43(z3):389-394. 被引量：2
8熊赟,陈越,朱扬勇.ProFaM:一个蛋白质序列家族挖掘算法[J].计算机研究与发展,2007,44(7):1160-1168. 被引量：2
9孙粮磊,李云,尹江,陈崚.一种改进的加权序列模式挖掘算法[J].计算机与数字工程,2010,38(11):4-9.
10耿汝年,董祥军,须文波.一种有效的基于图遍历的加权序列模式挖掘算法[J].控制与决策,2009,24(5):663-669. 被引量：4

微电子学与计算机

2012年第9期

浏览历史

内容加载中请稍等...

基于Map/Reduce集群上的模式空间划分的序列模式挖掘被引量：1

参考文献4

同被引文献3

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于Map/Reduce集群上的模式空间划分的序列模式挖掘 被引量：1

参考文献4

同被引文献3

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于Map/Reduce集群上的模式空间划分的序列模式挖掘被引量：1