期刊文献+

一种基于哈希链表的高效概念漂移连续属性处理算法 被引量:1

An Efficient Continuous-Valued Attribute Handling Algorithm for Mining Concept-Drifting Data Streams Based on the Extended Hash Table
下载PDF
导出
摘要 本文重点研究了数据流挖掘中存在概念漂移情形的连续属性处理算法。数据流是一种增量、在线、实时的数据模型。VFDT是数据流挖掘中数据呈稳态分布情形下最成功的算法之一;CVFDT是有效解决数据流挖掘中概念漂移问题的算法之一。基于CVFDT,本文提出了有效地解决数据流挖掘中存在概念漂移情形的连续属性处理问题的扩展哈希表算法HashCVFDT。该算法在属性值插入、查找和删除时具有哈希表的快速性,而在选取每个连续属性的最优化划分节点时解决了哈希表不能有序输出的缺点。 This paper focuses on continuous-valued attribute handling for mining concept-drifting data streams. Data stream is an incremental,online and real-time model. VFDT is one of the most successful algorithms in data stream mining when data take on a state of stable distribution;CVFDT is one of the effective algorithms for resolving the problem of concept drifting in data stream mining. Based on CVFDT, the paper proposes an efficient continuous-valued attribute handling method named Hash CVFDT for mining concept-drifting data streams based on the extended hash table. The algorithm is as fast as the hash table in attribute inserting, seeking and deleting, and solves the flaws of the hash table which cannot output. Sequently when selecting the opthnally partitioned nodes of each continuous-valued attribute.
出处 《计算机工程与科学》 CSCD 2008年第8期65-68,74,共5页 Computer Engineering & Science
基金 国家自然科学基金资助项目(60573057,60473057,90604007)
关键词 数据流挖掘 CVFDT连续属性 概念漂移 扩展哈希表 data streaming CVFDT continuous-valued attribute concept drifting extended hash table
  • 相关文献

参考文献27

  • 1Arasu A,Babcock B,Babu S, et al. STREAM: The Stanford Stream Data Manager Demonstration Description -Short Overview of System Status and Plans[C] ffProc of the ACM Int'1 Conf on Management of Data, 2003.
  • 2Aggarwal C, Han J ,Wang J, et al. On Demand Classification of Data Streams[C]//Proc of the 2004 Intq Conf on Knowledge Discovery and Data Mining, 2004.
  • 3Gaber M M, Zaslavskey Z, Krishnaswamy .K Mining Data Streams: a Review[J]. SIGMOD Record, 2005,34(2). 18-26.
  • 4Domingos P, Hulten G. Mining High-Speed Data Streams[C] ffProe of the Association for Computing Machinery 6th Int'l Conf on Knowledge Discovery and Data Mining, 2000:71-80.
  • 5Hulten G, Spencer L, Domingos P. Mining Time-Changing Data Streams[C]//Proc of ACM SIGKDD'01,2001.
  • 6Hoeffding W. Probability Inequalities for Sums of Bounded Random Variables[J]. Journal of the American Statistical Association, 1963,58.
  • 7Fayyad U M,Irani K B,On the Handling of Continuous-Valued Attributes in Decision Tree Generation on Learning[J]. Machine Learning, 1992,8(1 ) : 87-102
  • 8Last M. Online Classification of Nonstationary Data Streams [J]. Intelligent Data Analysis, 2002,6 (2) : 129-147.
  • 9Muthukrishnan S. Data Streams: Algorithms and Appllcations[C] // Proc of the 14th Annual ACM-SIAM Symp on Discrete Algorithms, 2003.
  • 10Wang H ,Fan W,Yu P,et al. Mining Concept-Driftlng Data Streams Using Ensemble Classlfiers[C]//Proe of the 9th ACM Int'l Conf on Knowledge Discovery and Data Mining, 2003.

同被引文献5

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部