基于前缀树的数据流容错概要结构构造

Construction of fault-tolerant synopsis over data stream based on prefix-tree

下载PDF

导出

摘要应用于数据流环境的数据挖掘算法应首要考虑算法的时空复杂性,而要实现消耗巨大计算资源的容错模式挖掘则更要专注于算法的效率.容错模式挖掘是为了从被噪声干扰的真实世界数据中获取允许一定程度错配的、更加泛化的有用知识.提出一种新的单遍历、高压缩的容错前缀树形概要结构DSFT-tree(Data Stream Fault-Tolerant Frequent PatternTree),用来捕捉最近到达的数据流中的数据元素,并且能够高效移除过期数据,实现最大限度地降低计算资源消耗.利用滑动窗指针和位向量表达法实现容错树形概要结构的高效重构,并进一步基于滑动窗口技术实现了数据流环境下的容错频繁项挖掘.实验采用IBM数据发生器产生事务数据,在合理时间内最终挖掘频繁项的数量为FP-stream算法的1.5倍. Complexity of data mining algorithm over data stream is the most important and it should be more focused on algorithm efficiency because of the great consumption of algorithm resources.Fault-tolerant frequent pattern mining is more generalized and suitable for extracting interesting knowledge from real-world data stream polluted by noise.An algorithm,called data stream fault-tolerant frequent pattern tree（DSFT-tree）,was proposed.It could achieve a frequency-descending and highly compact prefix-tree structure with a single-pass to capture fault-tolerant frequent itemsets in recent sliding window.To completely and efficiently perform the tree restructuring operation,an efficient mechanism based on sliding window pointer and bit-vector representation were utilized to restructure the tree.The efficient reconstruction mechanism greatly reduced the consumption of calculation resources and achieved fault-tolerant frequent itemsets mining.Experimental transaction database was generated by IBM synthetic data generator.The number of frequent itemsets extracted by DSFT-tree is 1.5 fold greater than that extracted by FP-stream.Experimental results show that developed algorithm is an efficient fault-tolerant synopsis.

作者由育阳张健沛杨志宏由勇

机构地区哈尔滨工程大学计算机学院中国医学科学院药用植物研究所空军航空医学研究所

出处《北京航空航天大学学报》 EI CAS CSCD 北大核心 2011年第5期564-568,共5页 Journal of Beijing University of Aeronautics and Astronautics

基金国家自然科学基金资助项目(61073041)

关键词数据流概要结构容错模式前缀树 data stream synopsis fault-tolerant frequent pattern prefix-tree

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献8

1Beringer J, Hullermeier E. Online clustering of parallel data streams [ J ]. Data & Knowledge Engineering,2006,58 ( 2 ) : 180 - 204.
2Li H F,Lee S Y. Mining frequent itemsets over data streams using efficient window sliding techniques [ J ]. Expert Systems with Applications ,2009,36 ( 2 ) : 1466 - 1477.
3Chang J H, Lee W S. Online data stream mining of reeent frequent itemsets by sliding window method[J]. Journal of Information Science,2005,31 (2) :76 -90.
4Yu J X,Chong Z H,Lu H J,et al. A false negative approach to mining frequent itemsets from high speed transactional data streams[ J ]. Information Sciences ,2006,176 ( 14 ) : 1986 - 2015.
5Yang C, Fayyad U, Bradley P S. Efficient discovery of error-tolerant frequent itemsets in high dimensions [ C ] //Proc of 2001 ACM lnt Conf on Knowledge Discovery in Databases. San Francisco, CA: Association for Computing Machinery, 2001 : 194 - 203.
6Bashir S, Halim Z, Baig A R. Mining fault tolerant frequent patterns using pattern growth approach [ C ] //AICCSA 08-6th IEEE/ACS International Conference on Computer Systems and Applications. Doha, Qatar: lnst of Elec and Elec Eng Computer Society ,2005 : 172 - 179.
7Zhang S C,Zhang J L,Zhang C Q. EDUA:an efficient algorithm for dynamic database mining [J].Information Sciences, 2007, 177 ( 13 ) :2756 - 2767.
8Chi Y, Wang H, Yu P S, et al. Catch the moment : maintaining closed frequent itemsets over a data stream sliding window [ J ]. Knowledge and Information Systems,2006,10(3 ) :265 -294.

1邹毅,朱晓萍,郭瑞.（N＋M）容错系统优化模型求解的改进粒子群算法[J].科技与企业,2006(6):71-73.
2朱继祥,李元香,夏学文.演化硬件的容错模式研究[J].小型微型计算机系统,2010,31(12):2472-2475. 被引量：7
3杨颖,杨磊.分布式流数据频繁项发现算法的研究[J].计算机应用,2008,28(1):136-139. 被引量：1
4舒平达,陈华辉.支持多时间粒度的数据流上最频繁K项挖掘[J].宁波大学学报（理工版）,2009,22(4):500-505. 被引量：1
5丁山,谢志强,张石,邓庆绪.容错模式下TTCAN系统矩阵的实时调度[J].东北大学学报（自然科学版）,2011,32(8):1084-1087. 被引量：1
6范联伟.浅谈聚类分析在大数据分析中的应用[J].中国电子商务,2014(17):67-67.
7米林.大数据实例及未来发展前景[J].软件工程师,2013(6):22-24.
8韩云.互联网时代：电视会被抛在后面吗？[J].首播,2014,0(9):28-29.
9郑咸义,段雄林.产生式系统的一种新匹配算法[J].华南理工大学学报（自然科学版）,1995,23(5):71-74.
10冯文峰,郭巧,吴素妍.基于多层概要结构的数据流的频繁项集发现算法[J].北京理工大学学报,2006,26(6):512-516. 被引量：1

北京航空航天大学学报

2011年第5期

浏览历史

内容加载中请稍等...

基于前缀树的数据流容错概要结构构造

参考文献8

相关作者

相关机构

相关主题

浏览历史