期刊文献+

分布式并行化数据流频繁模式挖掘算法 被引量:4

Distributed Parallel Algorithm of Mining Frequent Pattern on Data Stream
下载PDF
导出
摘要 为了提高数据流频繁模式挖掘的效率,文中基于经典的数据流频繁模式挖掘算法FP-Stream和分布式并行计算原理,设计了一种分布式并行化数据流频繁模式挖掘算法—DPFP-Stream(Distributed Parallel Algorithm of Mining Frequent Pattern on Data Stream)。该算法将建立频繁模式树的任务分为local和global两部分,并设置了参数"当前时间";将到达的流数据平均分配到多个不同的local节点,各local节点使用FP-Growth算法产生该单位时间内本节点的候选频繁项集,并按照单位时间将候选频繁项集及其支持度计数打包发送至global节点;global节点按"当前时间"合并各local节点的中间结果并更新模式树Pattern-Tree。在分布式数据流计算平台Storm上进行的算法实现和性能测试结果表明,DPFP-Stream算法的计算效率能够随着local节点或local bolt线程的增加而提高,适用于高效挖掘数据流中的频繁模式。 In order to improve the efficiency of mining frequent pattern on data stream,a Distributed Parallel Algorithm of Mining Frequent Pattern on Data Stream,named DPFP-Stream,is designed in this paper based on the ideas of classical FP-Stream and the distributed parallel computing. It divides the task of building frequent pattern tree into two parts: local and global,and introduces a newparameter"current time". The arrival data will be equally distributed into different local nodes. Then every local node uses FP-Growth algorithm to produce candidate frequent items,and packages them with relevant support count according to unit time,and sends them to the global node. The global node combines the results produced by local nodes according to the"current time"and updates the global Pattern-Tree.The results of implementing DPFP-Stream algorithm and testing its performance on Storm,a distribution data stream computing platform,showthat the computing efficiency of DPFP-Stream can increase linearly with the increasing of local nodes or the local bolts,and DPFP-Stream is applicable to effectively mine frequent pattern from data stream.
出处 《计算机技术与发展》 2016年第7期75-79,共5页 Computer Technology and Development
基金 国家自然科学基金资助项目(61302158 61571238) 中兴通讯产学研项目
关键词 数据流 频繁模式 分布式并行化 STORM data stream frequent pattern distributed parallelization Storm
  • 相关文献

参考文献15

  • 1Li Lingjuan, Li Xiong. An improved online stream data clustering algorithm [ C ]//Proceedings of second international conference on business computing and global informatization. Shanghai, China : [ s. n. ] ,2012:526-529.
  • 2Gaber M ,Zaslavsky A, Krishnaswamy S. Mining data streams : a review [ J ]. SIGMOD Record,2005,34 (2) : 18-26.
  • 3Han J, Kamber M, Pei J. Data mining : concepts and techniques [ M ]. [ s. l. ]: Elsevier,2006: 242- 248.
  • 4孙大为,张广艳,郑纬民.大数据流式计算:关键技术及系统实例[J].软件学报,2014,25(4):839-862. 被引量:313
  • 5孙玉芬,卢炎生.流数据挖掘综述[J].计算机科学,2007,34(1):1-5. 被引量:36
  • 6Charikar M, Chen K, Farach-Cohon M. Finding frequent items in data streams[ C ]//Proceedings of automata, languages and programming. Berlin : Springer, 2002 : 693 -703.
  • 7李国徽,陈辉.挖掘数据流任意滑动时间窗口内频繁模式[J].软件学报,2008,19(10):2585-2596. 被引量:45
  • 8Ma Ke, Li Lingjuan, Ji Yimu, et al. Research on parallelized stream data micro clustering algorithm [ C ]//Proceedings of ICCAET 2015. Zhengzhou ,China: [ s. n. ] ,2015:629-634.
  • 9Giannella C, Han J, Pei J, et al. Mining frequent patterns in data streams at multiple time granularities [ J ]. Next Generation Data Mining,2003,212 : 191-212.
  • 10刘学军,徐宏炳,董逸生,王永利,钱江波.挖掘数据流中的频繁模式[J].计算机研究与发展,2005,42(12):2192-2198. 被引量:25

二级参考文献101

  • 1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 2张昕,李晓光,王大玲,于戈.数据流中一种快速启发式频繁模式挖掘方法[J].软件学报,2005,16(12):2099-2105. 被引量:14
  • 3刘学军,徐宏炳,董逸生,王永利,钱江波.挖掘数据流中的频繁模式[J].计算机研究与发展,2005,42(12):2192-2198. 被引量:25
  • 4C. Giannella, J. Han, J. Pei, et al. Mining frequent patterns in data streams at multiple time granularities. In: H. Kargupta, A.Joshi, K. Sivakumar, eds. Next Generation Data Mining.Cambridge, Massachusetts: MIT Press, 2003. 191-212.
  • 5G.S. Manku, R. Motwani. Approximate frequency counts over streaming data. The 28th Int'l Conf. Very Large Data Bases(VLDB 2002), Hong Kong, 2002.
  • 6宋国杰 王腾蛟 唐世渭.数据流中频繁模式的评估与维护[A]..第20届全国数据库学术会议[C].长沙,2003..
  • 7R.M. Karp, C. H. Papadimitriou, S. Shenker. A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Systems, 2003, 28 (1): 51 - 55.
  • 8M. Charikar, K. Chen, M. Farach-Colton. Finding frequent items in data streams. The 29th Int'l Colloquium on Automata,Languages and Programming, Malaga, Spain, 2002.
  • 9Joong Hyuk Chang, Won Suk Lee. Finding recent frequent itemsets adaptively over online data streams. The 9th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD 03), Washington, D. C, 2003.
  • 10Wei-Guang Teng, Ming-Syan Chen, Philip S. Yu. A regressionbased temporal pattern mining scheme for data streams. The Int'l Conf. Very Large Data Bases, Berlin, Germany, 2003.

共引文献415

同被引文献31

引证文献4

二级引证文献41

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部