期刊文献+

一种实时有效的AECFP数据流频繁项挖掘算法 被引量:1

An efficient algorithm AECFP for mining frequent item over data streams
下载PDF
导出
摘要 由于数据流的高速产生性、强流动性及变化不稳定性的需求,数据流算法应在有限存储空间里实时准确分析数据,提取有用知识。在允许的误差范围内,提出一种有效的数据流频繁项挖掘算法AECFP,通过一种基于频繁项样本的数据结构记录抵达的项目集合,进行快速的保存样本,并在样本空间满时快速删除出现次数最小且最旧的非频繁项,保留相同支持数的其它频繁项。当用户查询频繁项时,快速实时准确挖掘数据流中的频繁项,适应数据波动变化。经过实验证明,该算法在挖掘频繁项时,具有快速的处理能力,满足空间消耗的低存储要求,并能保证数据频繁项的挖掘准确度。 Data stream is a high-speed generating, strong mobility, and unstable data series. Its algorithm can realtime analyze data in limited space. With the allowed deviation, a large number of data items could be stored in a new data structure, by which an algorithm AECFP was presented to keep the coming items and delete the nonfrequent items when the sample space is full and keep the other frequent items. During guest search frequent items, the algorithm in time mine all the frequent items rapidly which can adapt to changing data. The algorithm has good performance in the time and memory space consumed by the experiments and analysis.
出处 《桂林电子科技大学学报》 2009年第6期480-482,共3页 Journal of Guilin University of Electronic Technology
关键词 数据流 数据挖掘 频繁项 ε-近似 非频繁项 data streams data mining frequent item ε-approximate non-frequent item
  • 相关文献

参考文献8

  • 1GIANNELLA C, HAN J, PEI J,YAN X, YU P S. Mining frequent patterns indata streams at multiple time granularities [J]. Next Generation Data Mining, 2003:191-212.
  • 2MANKU G S, MOTWANI R. Approximate frequency counts over data streams[C]// Proceedings of VLDB[c3. San Mateo:Morgan Kauffman Publishers Inc, 2002: 346-357.
  • 3HIDBER YU. Online association rule mining[C]//Proceedings of the 1999 ACM SIGMOD international conference on Management of data (SIG-MOD 1999). Philadelphia: ACM Press, 1999:145-156.
  • 4王伟平,李建中,张冬冬,郭龙江.一种有效的挖掘数据流近似频繁项算法[J].软件学报,2007,18(4):884-892. 被引量:33
  • 51998 World Cup Web Site Access Logs [EB/OL]. (2004). http://ita.ee.lbl.gov/html/contrib/WorldCup.html.
  • 6Quest Synthetic Data Generation Code[EB/OL]. http,//www. almaden, ibm. com/cs/projects/iis/hdb/Projects/data-mining/ datasets/syndata, html# instructions.
  • 7张敬伟,周娅.基于集合划分的分布式数据库查询分解算法[J].桂林电子工业学院学报,2003,23(1):61-64. 被引量:6
  • 8张润莲.基于数据挖掘的移动大客户管理系统[J].桂林电子工业学院学报,2004,24(6):30-32. 被引量:2

二级参考文献18

  • 1贾焰 王志英 等.分布数据库技术[M].北京:国防工业出版社,2000..
  • 2Michael J A Berry,Gordon S Linoff.Mastering data mining[M].John Wiley&Sons,2000,72-116.
  • 3Daskalaki S, Kopanas I,Gourdara M et al.Data mining for decision support on customer insolvency in telecommunications business[J].European journal of Operation Research,2003,(145):239-255.
  • 4Clemnet Paul C and Northrop Linda M.Software architecture:An executive overview[J].Technical Report CMU/SEI-96-TR-003,ESC-TR-976-003,February 1996.
  • 5Babcock AK,Babu S,Datar M.Model and issues in data stream systems.In:Popa L,ed.Proc.of the 21st ACM SIGACT-SIGMOD-SIGART Symp.on Principles of Database Systems.Madison:ACM,2002.1-16.
  • 6Fang M,Shivakumar N,Garcia-Molina H,Motwani R,Ullman J.Computing iceberg queries eefficiently.In:Gupta A,Shmueli O,Widom J,eds.Proc.of the 24th Int'l Conf.on Very Large Data Bases.New York:Morgan Kaufmann Publishers,1998.299-310.
  • 7Agrawal R,Srikant R.Fast algorithms for mining association rules.In:Bocca JB,Jarke M,Zaniolo C,eds.Proc.of the 20th Int'l Conf.on Very Large Data Bases.Santiago:Morgan Kaufmann Publishers,1994.487-499.
  • 8Estan C,Verghese G.New directions in traffic measurement and accounting:Focusing on the elephants,ignoring the mice.ACM Trans.on Computer Systems,2003,21(3):270-313.
  • 9Charikar M,Chen K,Farach-Colton M.Finding frequent items in data streams.In:Widmayer P,Ruiz FT,Bueno RM,Hennessy M,Eidenbenz S,Conejo R,eds.Proc.of the Int'l Colloquium on Automata,Languages and Programming.Malaga:Springer-Verlag,2002.693-703.
  • 10Cormode G,Muthukrishnan S.What's hot and what's not:Tracking most frequent items dynamically.In:Halevy AY,Ives ZG,Doan AH,eds.Proc.of the 22nd ACM SIGACT-SIGMOD-SIGART Symp.on Principles of Database Systems.San Diego:ACM Press,2003.296-306.

共引文献38

同被引文献6

  • 1Han J, Kamber M. Data Mining Concepts and Tech- niques[M]. Orlando, USA: Morgan Kaufmann Publish- ers,2001:25-26.
  • 2MacQueen J B. Methods for Classification and Analysis of Multivariate Observations[M]. California, USA.. Uni- versity of California Press, 1967 : 281-297.
  • 3Arthur D,Vassilvitskii S. k-means+ +:The advantages of careful seeding[C]//Hal Gabow. Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. UMieh: Society for Industrial and Applied Mathematics, 2007 : 1027-1035.
  • 4Maitra R,Peterson A D,Ghosh A P. A systematic eval- uation of different methods for initializing the k-means clustering algorithm[J]. IEEE Transactions on Knowl- edge and Data Engineering, 2011,23 (10) : 132-145.
  • 5Bahmani B, Moseley B, Vattani A, et al. Scalable k- means+ + [J].The Proceedings of the VLDB Endow- ment,2012,5(7) :622-633.
  • 6Lloyd S P. Least squares quantization in PCM[J].IEEE Transactions on Information Theory, 1982,28 (2) : 129- 136.

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部