期刊文献+

一种基于位图计算并行挖掘大数据频繁模式算法 被引量:5

Parallel Mining Frequent Patterns in Big Data Based on Bitmap Computation
下载PDF
导出
摘要 设计了一种基于MapReduce框架并行挖掘大数据频繁模式的算法,算法首先研究了运用位图计算发现数据集频繁模式的方法;并对传统MapReduce框架进行扩展,增加了位图计算和不重要模式剪枝等计算功能;为了提高大数据模式挖掘的性能,还设计模式剪枝算法来识别并删除数据集中的不重要模式.最后,实验结果表明,该算法具有很强的可扩展性,并优于其它同类算法. This paper proposed a parallel algorithm of mining frequent patterns in big data using extended MapReduce. First,we analyze the frequent pattern mining method using bitmap computation by scanning the dataset only once. Secondly,we extended the traditional MapReduce frame by adding the bitmap computation and frequent mining function. In order to improve the performance of mining big data,an algorithm of pruning insignificant patterns in dataset is also presented. Finally,the experimental results show that the proposed method is efficient,strong in scalability,and prior to analogous algorithms.
作者 陈辉
出处 《小型微型计算机系统》 CSCD 北大核心 2014年第7期1599-1603,共5页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61262033 61262009 61363075)资助 江西省教育厅科技项目(GJJ13303)资助
关键词 大数据 频繁模式挖掘 位图计算 MAPREDUCE框架 big data frequent pattern mining bitmap computation MapReduce frame
  • 相关文献

参考文献15

  • 1Mishne G, Dalton J, Li Z, et al. Fast data in the era of big data: Twitter's real-time related query suggestion architecture [ C]. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, ACM, June,2013 : 1147-1158.
  • 2McKinsey Global Insititute. Big data: the next frontier for innova- tion, competition, and productivity [ EB/OL]. http://www. Mckin-sey. com/insights/mgi/research/technology and innovation /big data the next frontier for innovation ,June,2011.
  • 3Intel peer research:big data analysis,intel's it manager survey on how organizations are using the big data[ EB/OL]. http ://www. in- tel. com/content/www/us/en/big-data/data-insights-peer-research- report. html, Auguest,2012.
  • 4Chang J H, Lee W S. Finding recent frequent itemsets adaptively over online data streams [ C ]. In Proceedings of the Ninth ACM SIGKDD Intemational Conference on Knowledge Discovery and Data Mining, Augest,2003 : 487-492.
  • 5Agrawal R, Srikant R. Fast algorithms for mining association rules in large databases[ C]. In Proceedings of 20th International Confer- ence on Very Large Data Bases, September, 1994:487-499.
  • 6Yu J X, Chong Z, Lu H, et al. A false negative approach to min- ing frequent itemsets from high speed transactional data streams [ J ]. Information Sciences, 2006,176 ( 14 ) : 1986 -2015.
  • 7Hart J, Pei J, Yin Y. Mining frequent patterns without candidate generation[ C]. In Proceedings of the 9th International Conference on Parallel Computing Technologies, September,2007 : 623-631.
  • 8李国徽,陈辉.挖掘数据流任意滑动时间窗口内频繁模式[J].软件学报,2008,19(10):2585-2596. 被引量:45
  • 9Li Ning, Zeng Li, He Qing, et al. Parallel implementation of apri- ori algorithm based on MapReduce [ C ]. In Proceedings of 2012 13th ACIS International Conference on Software Engineering, Arti- ficial Intelligence, Networking and ting,August 8-10, 2012 : 236-241. Compu-.
  • 10Sandy Moens, Emin Aksehirli, Bart Goethals. Frequent itemset mining for big data[ C]. In Proceedings of 2013 IEEE Internation- al Conference on Big Data, October 6-9, 2013: 111-118.

二级参考文献14

  • 1Gaber MM, Zaslavsky A, Krishnaswamy S. Mining data streams: A review. ACM SIGMOD Record, 2005,34(2): 18-26.
  • 2Jiang N, Gruenwald L. Research issues in data stream association rule mining. ACM SIGMOD Record, 2006,35(1):14-19.
  • 3Garofalakis MN, Gehrke J. Querying and mining data streams: You only get one look a tutorial. In: Franklin MJ, Moon B, Ailamaki A, eds. Proc. of the 2002 ACM SIGMOD Int'l Conf. on Management of Data. Madison: ACM Press, 2002. 635-635.
  • 4Giannella C, Han J, Pei J, Yan X, Yu PS. Mining frequent patterns in data streams at multiple time granularities. In: Data Mining: Next Generation Challenges and Future Directions. 2004. 191-212.
  • 5Chang JH, Lee WS. Finding recent frequent itemsets adaptively over online data streams. In: Lise G, Ted ES, Pedro D, Christos F, eds. Proc. of the 9th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. Washington: ACM Press, 2003. 487-492.
  • 6Jiang N, Gruenwald L. CFI-Stream: Mining closed frequent itemsets in data streams. In: Roberto B, Kristin PB, Gautam D, Dimitrios G, Johannes G, eds. Proc. of the 12th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. Philadelphia: ACM Press, 2006. 592-597.
  • 7Yu JX, Chong Z, Lu H, Zhang Z, Zhou A. A false negative approach to mining frequent itemsets from high speed transactional data streams, Information Sciences, 2006,176(4):1986-2015.
  • 8Leung CKS, Khan QI. DStree: A tree structure for the mining of frequent sets from data streams. In: Clifton CW, Zhong N, Liu JM, Wah BW, Wu XD, eds. Proc. of the 6th Int'l Conf. on Data Mining. Hong Kong: IEEE Press, 2006. 928-932.
  • 9Wong RCW, Fu AWC. Mining top-k frequent itemsets from data streams. Data Mining and Knowledge Discovery, 2006,13(2): 193-217.
  • 10Papadimitriou A, Yu PS. Optimal multi-scale patterns in time series streams. In: Roberto B, Kristin PB, Gautam D, Dimitrios G, Johannes G, eds. Proc. of the 2006 ACM SIGMOD Int'l Conf. of Management of Data. Chicago: ACM Press, 2006. 647-658.

共引文献45

同被引文献75

  • 1刘双跃,杨蕾,彭丽.基于改进Apriori算法的煤矿物态隐患系统设计与应用[J].煤炭技术,2015,34(4):318-320. 被引量:6
  • 2陈晓云,陈袆,王雷,李荣陆,胡运发.基于分类规则树的频繁模式文本分类[J].软件学报,2006,17(5):1017-1025. 被引量:19
  • 3张鹏,童云海,唐世渭,杨冬青,马秀莉.一种有效的隐私保护关联规则挖掘方法[J].软件学报,2006,17(8):1764-1774. 被引量:53
  • 4RUSITSCHKA S, EGER K, GERDES C. Smart grid data cloud: a model for utilizing cloud computing in the smart grid do- main [ C ]//Smart Grid Communications ( SmartGridComm), 2010 First IEEE International Conference. Gaithersburg, MD : IEEE, 2010:483-488.
  • 5KAWASOE S, IGARASHI Y, SHIBAYAMA K, et al. Examples of distributed information platforms constructed by power utilities in Japan [ C ]//CIGRE 2012. Paris, France : CIGRE, 2012 : 108-113.
  • 6LI Feng-jun, LUO Bo, LIU Peng. Secure and privacy-preserving information aggregation for smart grids[ J ]. International Journal of Security and Networks, 2011,6 ( 1 ) . 28-39.
  • 7LI Feng-jun, LUO Bo, LIU Peng. Secure information aggregation for smart grids using homomorphic encryption [ C ]// Smart Grid Communications, 2010 First IEEE International Conference on. Gaithersburg, MD: IEEE, 2010:327-332.
  • 8LIANG Xiao-hui, LI Xu, LIN Xiao-dong, et al. Eppa: An efficient and privacy-preserving aggregation scheme for secure smart grid communications [ J 1. IEEE Transactions on Parallel and Distributed Systems, 2012,23 ( 9 ) : 1621-1631.
  • 9KURSAWE K, DANEZIS G, KOHLWEISS M. Privacy-friendly aggregation for the smart-grid [ M ]. Privacy Enhancing Technologies, 2011 : 175-191.
  • 10RIAL A, DANEZIS G. Privacy-preserving smart metering[ M 1. Annual ACM Workshop on Privacy in the Electronic Socie- tv .2011.49-60.

引证文献5

二级引证文献29

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部