期刊文献+

基于前缀树的数据流容错概要结构构造

Construction of fault-tolerant synopsis over data stream based on prefix-tree
下载PDF
导出
摘要 应用于数据流环境的数据挖掘算法应首要考虑算法的时空复杂性,而要实现消耗巨大计算资源的容错模式挖掘则更要专注于算法的效率.容错模式挖掘是为了从被噪声干扰的真实世界数据中获取允许一定程度错配的、更加泛化的有用知识.提出一种新的单遍历、高压缩的容错前缀树形概要结构DSFT-tree(Data Stream Fault-Tolerant Frequent PatternTree),用来捕捉最近到达的数据流中的数据元素,并且能够高效移除过期数据,实现最大限度地降低计算资源消耗.利用滑动窗指针和位向量表达法实现容错树形概要结构的高效重构,并进一步基于滑动窗口技术实现了数据流环境下的容错频繁项挖掘.实验采用IBM数据发生器产生事务数据,在合理时间内最终挖掘频繁项的数量为FP-stream算法的1.5倍. Complexity of data mining algorithm over data stream is the most important and it should be more focused on algorithm efficiency because of the great consumption of algorithm resources.Fault-tolerant frequent pattern mining is more generalized and suitable for extracting interesting knowledge from real-world data stream polluted by noise.An algorithm,called data stream fault-tolerant frequent pattern tree(DSFT-tree),was proposed.It could achieve a frequency-descending and highly compact prefix-tree structure with a single-pass to capture fault-tolerant frequent itemsets in recent sliding window.To completely and efficiently perform the tree restructuring operation,an efficient mechanism based on sliding window pointer and bit-vector representation were utilized to restructure the tree.The efficient reconstruction mechanism greatly reduced the consumption of calculation resources and achieved fault-tolerant frequent itemsets mining.Experimental transaction database was generated by IBM synthetic data generator.The number of frequent itemsets extracted by DSFT-tree is 1.5 fold greater than that extracted by FP-stream.Experimental results show that developed algorithm is an efficient fault-tolerant synopsis.
出处 《北京航空航天大学学报》 EI CAS CSCD 北大核心 2011年第5期564-568,共5页 Journal of Beijing University of Aeronautics and Astronautics
基金 国家自然科学基金资助项目(61073041)
关键词 数据流 概要结构 容错模式 前缀树 data stream synopsis fault-tolerant frequent pattern prefix-tree
  • 相关文献

参考文献8

  • 1Beringer J, Hullermeier E. Online clustering of parallel data streams [ J ]. Data & Knowledge Engineering,2006,58 ( 2 ) : 180 - 204.
  • 2Li H F,Lee S Y. Mining frequent itemsets over data streams using efficient window sliding techniques [ J ]. Expert Systems with Applications ,2009,36 ( 2 ) : 1466 - 1477.
  • 3Chang J H, Lee W S. Online data stream mining of reeent frequent itemsets by sliding window method[J]. Journal of Information Science,2005,31 (2) :76 -90.
  • 4Yu J X,Chong Z H,Lu H J,et al. A false negative approach to mining frequent itemsets from high speed transactional data streams[ J ]. Information Sciences ,2006,176 ( 14 ) : 1986 - 2015.
  • 5Yang C, Fayyad U, Bradley P S. Efficient discovery of error-tolerant frequent itemsets in high dimensions [ C ] //Proc of 2001 ACM lnt Conf on Knowledge Discovery in Databases. San Francisco, CA: Association for Computing Machinery, 2001 : 194 - 203.
  • 6Bashir S, Halim Z, Baig A R. Mining fault tolerant frequent patterns using pattern growth approach [ C ] //AICCSA 08-6th IEEE/ACS International Conference on Computer Systems and Applications. Doha, Qatar: lnst of Elec and Elec Eng Computer Society ,2005 : 172 - 179.
  • 7Zhang S C,Zhang J L,Zhang C Q. EDUA:an efficient algorithm for dynamic database mining [J].Information Sciences, 2007, 177 ( 13 ) :2756 - 2767.
  • 8Chi Y, Wang H, Yu P S, et al. Catch the moment : maintaining closed frequent itemsets over a data stream sliding window [ J ]. Knowledge and Information Systems,2006,10(3 ) :265 -294.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部