一种实时有效的AECFP数据流频繁项挖掘算法被引量：1

An efficient algorithm AECFP for mining frequent item over data streams

下载PDF

导出

摘要由于数据流的高速产生性、强流动性及变化不稳定性的需求,数据流算法应在有限存储空间里实时准确分析数据,提取有用知识。在允许的误差范围内,提出一种有效的数据流频繁项挖掘算法AECFP,通过一种基于频繁项样本的数据结构记录抵达的项目集合,进行快速的保存样本,并在样本空间满时快速删除出现次数最小且最旧的非频繁项,保留相同支持数的其它频繁项。当用户查询频繁项时,快速实时准确挖掘数据流中的频繁项,适应数据波动变化。经过实验证明,该算法在挖掘频繁项时,具有快速的处理能力,满足空间消耗的低存储要求,并能保证数据频繁项的挖掘准确度。 Data stream is a high-speed generating, strong mobility, and unstable data series. Its algorithm can realtime analyze data in limited space. With the allowed deviation, a large number of data items could be stored in a new data structure, by which an algorithm AECFP was presented to keep the coming items and delete the nonfrequent items when the sample space is full and keep the other frequent items. During guest search frequent items, the algorithm in time mine all the frequent items rapidly which can adapt to changing data. The algorithm has good performance in the time and memory space consumed by the experiments and analysis.

作者谢玉忠朱国魂吴春

机构地区桂林电子科技大学计算机与控制学院

出处《桂林电子科技大学学报》 2009年第6期480-482,共3页 Journal of Guilin University of Electronic Technology

关键词数据流数据挖掘频繁项 ε-近似非频繁项 data streams data mining frequent item ε-approximate non-frequent item

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献8

1GIANNELLA C, HAN J, PEI J,YAN X, YU P S. Mining frequent patterns indata streams at multiple time granularities [J]. Next Generation Data Mining, 2003:191-212.
2MANKU G S, MOTWANI R. Approximate frequency counts over data streams[C]// Proceedings of VLDB[c3. San Mateo:Morgan Kauffman Publishers Inc, 2002: 346-357.
3HIDBER YU. Online association rule mining[C]//Proceedings of the 1999 ACM SIGMOD international conference on Management of data (SIG-MOD 1999). Philadelphia: ACM Press, 1999:145-156.
4王伟平,李建中,张冬冬,郭龙江.一种有效的挖掘数据流近似频繁项算法[J].软件学报,2007,18(4):884-892. 被引量：33
51998 World Cup Web Site Access Logs [EB/OL]. (2004). http://ita.ee.lbl.gov/html/contrib/WorldCup.html.
6Quest Synthetic Data Generation Code[EB/OL]. http,//www. almaden, ibm. com/cs/projects/iis/hdb/Projects/data-mining/ datasets/syndata, html# instructions.
7张敬伟,周娅.基于集合划分的分布式数据库查询分解算法[J].桂林电子工业学院学报,2003,23(1):61-64. 被引量：6
8张润莲.基于数据挖掘的移动大客户管理系统[J].桂林电子工业学院学报,2004,24(6):30-32. 被引量：2

二级参考文献18

1贾焰王志英等.分布数据库技术[M].北京：国防工业出版社,2000..
2Michael J A Berry,Gordon S Linoff.Mastering data mining[M].John Wiley&Sons,2000,72-116.
3Daskalaki S, Kopanas I,Gourdara M et al.Data mining for decision support on customer insolvency in telecommunications business[J].European journal of Operation Research,2003,(145):239-255.
4Clemnet Paul C and Northrop Linda M.Software architecture:An executive overview[J].Technical Report CMU/SEI-96-TR-003,ESC-TR-976-003,February 1996.
5Babcock AK,Babu S,Datar M.Model and issues in data stream systems.In:Popa L,ed.Proc.of the 21st ACM SIGACT-SIGMOD-SIGART Symp.on Principles of Database Systems.Madison:ACM,2002.1-16.
6Fang M,Shivakumar N,Garcia-Molina H,Motwani R,Ullman J.Computing iceberg queries eefficiently.In:Gupta A,Shmueli O,Widom J,eds.Proc.of the 24th Int'l Conf.on Very Large Data Bases.New York:Morgan Kaufmann Publishers,1998.299-310.
7Agrawal R,Srikant R.Fast algorithms for mining association rules.In:Bocca JB,Jarke M,Zaniolo C,eds.Proc.of the 20th Int'l Conf.on Very Large Data Bases.Santiago:Morgan Kaufmann Publishers,1994.487-499.
8Estan C,Verghese G.New directions in traffic measurement and accounting:Focusing on the elephants,ignoring the mice.ACM Trans.on Computer Systems,2003,21(3):270-313.
9Charikar M,Chen K,Farach-Colton M.Finding frequent items in data streams.In:Widmayer P,Ruiz FT,Bueno RM,Hennessy M,Eidenbenz S,Conejo R,eds.Proc.of the Int'l Colloquium on Automata,Languages and Programming.Malaga:Springer-Verlag,2002.693-703.
10Cormode G,Muthukrishnan S.What's hot and what's not:Tracking most frequent items dynamically.In:Halevy AY,Ives ZG,Doan AH,eds.Proc.of the 22nd ACM SIGACT-SIGMOD-SIGART Symp.on Principles of Database Systems.San Diego:ACM Press,2003.296-306.

共引文献38

1邝祝芳,阳国贵,辛动军.SWFPM:一种有效的数据流频繁项挖掘算法[J].计算机应用研究,2009,26(2):466-469. 被引量：4
2张玉,方滨兴,张永铮.高速网络监控中大流量对象的识别[J].中国科学：信息科学,2010,40(2):340-355. 被引量：11
3高宏宾,张小彬,杨海振.一种实时挖掘数据流近似频繁项的算法[J].计算机应用,2008,28(S2):219-222. 被引量：2
4李建中,高宏.无线传感器网络的研究进展[J].计算机研究与发展,2008,45(1):1-15. 被引量：442
5王秀坤,王铁存,周国能,冯维.挖掘数据流近似频繁项的改进算法[J].计算机工程与应用,2008,44(13):150-152.
6杨志伟.分布式数据库系统查询的研究与实现[J].内蒙古科技与经济,2008(8):48-49. 被引量：1
7陈艳华,伊波,崔艳玲,李红宇.集合划分的深层结构的计算机辅助研究[J].电脑开发与应用,2008,21(7):22-24.
8邝祝芳,谭骏珊,杨卫民,辛动军.基于渐增最小支持度函数的数据流频繁项挖掘[J].微电子学与计算机,2008,25(10):196-198.
9祖悦,党德玉.网格环境下基于分布式数据流频繁模式的数据更新算法[J].吉林化工学院学报,2009,26(1):54-58.
10吴枫,仲妍,金鑫,吴泉源,贾焰,杨树强.滑动窗口内进化数据流任意形状聚类算法[J].小型微型计算机系统,2009,30(5):887-890. 被引量：6

同被引文献6

1Han J, Kamber M. Data Mining Concepts and Tech- niques[M]. Orlando, USA: Morgan Kaufmann Publish- ers,2001:25-26.
2MacQueen J B. Methods for Classification and Analysis of Multivariate Observations[M]. California, USA.. Uni- versity of California Press, 1967 : 281-297.
3Arthur D,Vassilvitskii S. k-means+ +:The advantages of careful seeding[C]//Hal Gabow. Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. UMieh: Society for Industrial and Applied Mathematics, 2007 : 1027-1035.
4Maitra R,Peterson A D,Ghosh A P. A systematic eval- uation of different methods for initializing the k-means clustering algorithm[J]. IEEE Transactions on Knowl- edge and Data Engineering, 2011,23 (10) : 132-145.
5Bahmani B, Moseley B, Vattani A, et al. Scalable k- means+ + [J].The Proceedings of the VLDB Endow- ment,2012,5(7) :622-633.
6Lloyd S P. Least squares quantization in PCM[J].IEEE Transactions on Information Theory, 1982,28 (2) : 129- 136.

引证文献1

1刘家星,朱国魂,席敏.一种基于半径的k-means算法[J].桂林电子科技大学学报,2013,33(2):134-138. 被引量：1

二级引证文献1

1张明微,吴海涛.一种优化初始聚类中心的k-means算法[J].上海师范大学学报（自然科学版）,2016,45(5):599-603. 被引量：2

1李迎,张璟,虎群,李军怀.人工鱼群算法在虚拟机分配中的应用[J].计算机工程与应用,2015,51(4):22-28. 被引量：1
2邓维维,彭宏.一种新的演化文本流聚类算法[J].计算机科学,2007,34(9):125-127.
3张博,张虹.基于关系数据库的关联规则的形式化开采[J].计算机工程与设计,2006,27(24):4663-4666. 被引量：1
4曾安,谢杰民,潘丹.基于项目候选集的协同过滤算法[J].计算机应用研究,2016,33(12):3620-3624. 被引量：1
5石杰.云计算环境下的数据挖掘应用[J].微型机与应用,2015,34(5):13-15. 被引量：9
6郝杰,聂亚平.基于数据立方体的多层关联规则的挖掘[J].中国科技信息,2013(8):106-107. 被引量：1
7李学明,沈雪,王震.基于树的不确定数据频繁项的挖掘算法研究[J].世界科技研究与发展,2010,32(4):433-436.
8陈超泉,黄佳欢,江云辉.压缩UF-tree挖掘不确定数据频繁项[J].计算机应用研究,2014,31(3):716-719. 被引量：1
9商蜀西,曾艺,聂小平.单片微机MCS-51实时测量系统中一批数据的一种数字滤波方法[J].荆州师专学报,1998,21(2):22-24. 被引量：1
10郭庆,姜宇鸣.OPC标准应用研究[J].河南科技,2011,30(3X):39-39.

桂林电子科技大学学报

2009年第6期

浏览历史

内容加载中请稍等...

一种实时有效的AECFP数据流频繁项挖掘算法被引量：1

参考文献8

二级参考文献18

共引文献38

同被引文献6

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

一种实时有效的AECFP数据流频繁项挖掘算法 被引量：1

参考文献8

二级参考文献18

共引文献38

同被引文献6

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

一种实时有效的AECFP数据流频繁项挖掘算法被引量：1