基于滑动窗口的数据流中近期频繁项挖掘

Mining recent frequent items from a sliding window over data streams

下载PDF

导出

摘要提出了一种在单独数据流中挖掘近期频繁项的算法MRFI。该算法采用基于对时间敏感的滑动窗口的模式,保证了挖掘结果的时效性,并利用循环队列和二叉排序树实现了简单高效的数据存储和处理,该方法是一种近似算法,它可以消除历史数据对挖掘结果的影响。实验采用IBM数据发生器产生合成数据,证明了该算法的有效性。 A new algorithm is proposed to mining recent frequent items in single data stream,called MRFI.The proposed algorithm works under time-sensitive sliding windows,and guarantees the mining result is recent.We used circular queue and binary sort tree to store and process streaming data that is simple and efficient.The proposed method is an approximate algorithm,it can eliminate the influence of old data to mined result.Based on the IBM test data generator,the experimental results show the feasibility and effectiveness of the algorithm.

作者刘超耿蕊

机构地区齐齐哈尔大学计算中心

出处《齐齐哈尔大学学报（自然科学版）》 2010年第3期9-13,共5页 Journal of Qiqihar University(Natural Science Edition)

关键词数据流频繁模式滑动窗口循环队列二叉排序树 data stream frequent patterns sliding windows circular queue binary sort tree

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献7

1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量：161
2KARP R,PAPADIMITRIOU C,SHENKER S.A simple algorithm for finding frequent elements in sets and bags[J].Trans on Database Systems,2003,28(1):51-55.
3Kollios G,Gunopoulos D,Koudas N.Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets[J].IEEE Transactions on Knowledge and Data Engineering,2003,15(5):1 170-1 187.
4WU Fan,CHIANG S W,LIN J R.A new approach to mine frequent patterns using item-transformation methods[J].Information Systems,2007,32(7):1 056-1 072.
5XIN Dong,HAN Jia-wei,YAN Xi-feng,et al.On compressing frequent patterns[J].Data & Knowledge Engineering,2007,60(1):5-29.
6邝祝芳,阳国贵,辛动军.SWFPM:一种有效的数据流频繁项挖掘算法[J].计算机应用研究,2009,26(2):466-469. 被引量：4
7程杰.基于二进制的频繁项集挖掘新算法[J].电脑知识与技术,2009,5(5):3486-3488. 被引量：1

二级参考文献58

1王伟平,李建中,张冬冬,郭龙江.一种有效的挖掘数据流近似频繁项算法[J].软件学报,2007,18(4):884-892. 被引量：33
2Babcock B, Babu S, Datar M, Motwani R, Widom J. Models and issues in data streams. In: Popa L, ed. Proc. of the 21st ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems. Madison: ACM Press, 2002. 1～16.
3Terry D, Goldberg D, Nichols D, Oki B. Continuous queries over append-only databases. SIGMOD Record, 1992,21(2):321-330.
4Avnur R, Hellerstein J. Eddies: Continuously adaptive query processing. In: Chen W, Naughton JF, Bernstein PA, eds. Proc. of the 2000 ACM SIGMOD Int'l Conf. on Management of Data. Dallas: ACM Press, 2000. 261～272.
5Hellerstein J, Franklin M, Chandrasekaran S, Deshpande A, Hildrum K, Madden S, Raman V, Shah MA. Adaptive query processing: Technology in evolution. IEEE Data Engineering Bulletin, 2000,23(2):7-18.
6Carney D, Cetinternel U, Cherniack M, Convey C, Lee S, Seidman G, Stonebraker M, Tatbul N, Zdonik S. Monitoring streams?A new class of DBMS applications. Technical Report, CS-02-01, Providence: Department of Computer Science, Brown University, 2002.
7Guha S, Mishra N, Motwani R, O'Callaghan L. Clustering data streams. In: Blum A, ed. The 41st Annual Symp. on Foundations of Computer Science, FOCS 2000. Redondo Beach: IEEE Computer Society, 2000. 359-366.
8Domingos P, Hulten G. Mining high-speed data streams. In: Ramakrishnan R, Stolfo S, Pregibon D, eds. Proc. of the 6th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. Boston: ACM Press, 2000. 71-80.
9Domingos P, Hulten G, Spencer L. Mining time-changing data streams. In: Provost F, Srikant R, eds. Proc. of the 7th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. San Francisco: ACM Press, 2001. 97～106.
10Zhou A, Cai Z, Wei L, Qian W. M-Kernel merging: Towards density estimation over data streams. In: Cha SK, Yoshikawa M, eds. The 8th Int'l Conf. on Database Systems for Advanced Applications (DASFAA 2003). Kyoto: IEEE Computer Society, 2003. 285～292.

共引文献163

1田李,王乐,贾焰,邹鹏,李爱平.分布式数据流上低通信开销的连续极值查询方法研究[J].计算机研究与发展,2007,44(z3):61-66.
2陈飞波,钱卫宁,周傲英.基于最窄平行四边形的数据流突变检测算法[J].计算机研究与发展,2007,44(z3):505-510.
3何月梅,杜海艳,王保民.分形技术与矢量量化相结合的网络流量异常检测研究[J].邯郸学院学报,2009,19(3):73-76.
4秦林新,刘奇志.一种乱序数据流上的偏倚抽样算法[J].计算机研究与发展,2011,48(S3):298-303.
5张明明,芦琳.电能计量中的异常数据研究[J].电气应用,2013,0(S1):42-46. 被引量：2
6金澈清,崇志宏,周傲英.一种实时监控最近邻的近似算法[J].计算机科学与探索,2007,1(2):146-159.
7杨宜东,孙志挥,张净.基于核密度估计的分布数据流离群点检测[J].计算机研究与发展,2005,42(9):1498-1504. 被引量：8
8杜威,邹先霞.基于数据流的滑动窗口机制的研究[J].计算机工程与设计,2005,26(11):2922-2924. 被引量：11
9刘赏,黄亚楼,倪维健.流数据聚类模型变化检测策略[J].计算机工程与应用,2006,42(5):15-18.
10彭宏,刘洋,邓维维,郑启伦.股票数据流的相关性计算方法[J].华南理工大学学报（自然科学版）,2006,34(1):86-89. 被引量：9

1孟佳娜,卢云宏.变换存储结构的一种高效排序算法[J].小型微型计算机系统,2004,25(7):1406-1408. 被引量：2
2青宇航.基于Delphi循环队列概念模型的设计[J].科学技术与工程,2005,5(14):990-993.
3朱洪浩,姚保峰,王磊,郭有强.一种构建n个结点的二叉树所有形态的算法[J].海南大学学报（自然科学版）,2012,30(2):123-128. 被引量：2
4王钢.二叉排序树转换成平衡二叉树[J].科技信息,2006,0(12):70-70. 被引量：1
5吴江红,周长英,于秀丽.一种改进的关联规则挖掘算法[J].天津科技大学学报,2005,20(2):57-60. 被引量：1
6戴磊,马小平,姜代红.基于优化Dijkstra算法的物流配送系统设计[J].微电子学与计算机,2011,28(10):32-35. 被引量：7
7刘山.基于新结构的二叉排序树生成算法[J].中国民航学院学报,2000,18(4):50-52. 被引量：1
8朱洪浩.数据结构中平衡二叉树的教学探讨与研究[J].赤峰学院学报（自然科学版）,2012,28(5):19-21. 被引量：4
9孙士明,王爱国,马秀军.多通道数据采集中的一种主存数据组织方法[J].科学技术与工程,2010,10(22):5407-5411.
10王茜,高志鹏,邱雪松,王兴斌.基于频繁项挖掘的空间关联性子簇形成算法[J].北京邮电大学学报,2015,38(B06):20-23.

齐齐哈尔大学学报（自然科学版）

2010年第3期

浏览历史

内容加载中请稍等...

基于滑动窗口的数据流中近期频繁项挖掘

参考文献7

二级参考文献58

共引文献163

相关作者

相关机构

相关主题

浏览历史