一种基于哈希链表的高效概念漂移连续属性处理算法被引量：1

An Efficient Continuous-Valued Attribute Handling Algorithm for Mining Concept-Drifting Data Streams Based on the Extended Hash Table

下载PDF

导出

摘要本文重点研究了数据流挖掘中存在概念漂移情形的连续属性处理算法。数据流是一种增量、在线、实时的数据模型。VFDT是数据流挖掘中数据呈稳态分布情形下最成功的算法之一;CVFDT是有效解决数据流挖掘中概念漂移问题的算法之一。基于CVFDT,本文提出了有效地解决数据流挖掘中存在概念漂移情形的连续属性处理问题的扩展哈希表算法HashCVFDT。该算法在属性值插入、查找和删除时具有哈希表的快速性,而在选取每个连续属性的最优化划分节点时解决了哈希表不能有序输出的缺点。 This paper focuses on continuous-valued attribute handling for mining concept-drifting data streams. Data stream is an incremental,online and real-time model. VFDT is one of the most successful algorithms in data stream mining when data take on a state of stable distribution;CVFDT is one of the effective algorithms for resolving the problem of concept drifting in data stream mining. Based on CVFDT, the paper proposes an efficient continuous-valued attribute handling method named Hash CVFDT for mining concept-drifting data streams based on the extended hash table. The algorithm is as fast as the hash table in attribute inserting, seeking and deleting, and solves the flaws of the hash table which cannot output. Sequently when selecting the opthnally partitioned nodes of each continuous-valued attribute.

作者王涛李舟军颜跃进

机构地区国防科技大学计算机学院北京航空航天大学计算机学院

出处《计算机工程与科学》 CSCD 2008年第8期65-68,74,共5页 Computer Engineering & Science

基金国家自然科学基金资助项目(60573057,60473057,90604007)

关键词数据流挖掘 CVFDT连续属性概念漂移扩展哈希表 data streaming CVFDT continuous-valued attribute concept drifting extended hash table

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献27

1Arasu A,Babcock B,Babu S, et al. STREAM: The Stanford Stream Data Manager Demonstration Description -Short Overview of System Status and Plans[C] ffProc of the ACM Int'1 Conf on Management of Data, 2003.
2Aggarwal C, Han J ,Wang J, et al. On Demand Classification of Data Streams[C]//Proc of the 2004 Intq Conf on Knowledge Discovery and Data Mining, 2004.
3Gaber M M, Zaslavskey Z, Krishnaswamy .K Mining Data Streams: a Review[J]. SIGMOD Record, 2005,34(2). 18-26.
4Domingos P, Hulten G. Mining High-Speed Data Streams[C] ffProe of the Association for Computing Machinery 6th Int'l Conf on Knowledge Discovery and Data Mining, 2000:71-80.
5Hulten G, Spencer L, Domingos P. Mining Time-Changing Data Streams[C]//Proc of ACM SIGKDD'01,2001.
6Hoeffding W. Probability Inequalities for Sums of Bounded Random Variables[J]. Journal of the American Statistical Association, 1963,58.
7Fayyad U M,Irani K B,On the Handling of Continuous-Valued Attributes in Decision Tree Generation on Learning[J]. Machine Learning, 1992,8(1 ) : 87-102
8Last M. Online Classification of Nonstationary Data Streams [J]. Intelligent Data Analysis, 2002,6 (2) : 129-147.
9Muthukrishnan S. Data Streams: Algorithms and Appllcations[C] // Proc of the 14th Annual ACM-SIAM Symp on Discrete Algorithms, 2003.
10Wang H ,Fan W,Yu P,et al. Mining Concept-Driftlng Data Streams Using Ensemble Classlfiers[C]//Proe of the 9th ACM Int'l Conf on Knowledge Discovery and Data Mining, 2003.

同被引文献5

1袁磊,张阳,李梅,李雪,王勇.在数据流管理系统中实现快速决策树算法(英文)[J].计算机科学与探索,2010,4(8):673-682. 被引量：3
2沈超,邓彩凤.论Storm分布式实时计算工具[J].中国科技纵横,2014(3):53-53. 被引量：3
3黎文阳.大数据处理模型Apache Spark研究[J].现代计算机（中旬刊）,2015(3):55-60. 被引量：34
4刘志强,顾荣,袁春风,黄宜华.基于SparkR的分类算法并行化研究[J].计算机科学与探索,2015,9(11):1281-1294. 被引量：14
5张发扬,李玲娟,陈煜.VFDT算法基于Storm平台的实现方案[J].计算机技术与发展,2016,26(9):192-196. 被引量：3

引证文献1

1庄荣,李玲娟.基于Spark的CVFDT分类算法并行化研究[J].计算机技术与发展,2018,28(6):35-38. 被引量：3

二级引证文献3

1盛俊.面向大数据的挖掘分类算法研究[J].信息技术与信息化,2019,0(12):123-125.
2袁焦,王珣,潘兆马,杨学锋,邹文露.基于机器学习的列车设备故障预测模型研究[J].计算机与现代化,2020(12):49-54. 被引量：4
3谌婧娇.基于Spark的决策树算法对航班延误预测研究[J].电脑知识与技术,2021,17(4):217-219. 被引量：3

1夏煜,郎荣玲,戴冠中.Linux文件系统数据缓冲区的分析研究[J].计算机工程与应用,2001,37(17):126-128. 被引量：4
2董万归.一种基于哈希链表的多关键字排序算法[J].电脑知识与技术,2010(2):859-860.
3严磊,丁宾,姚志敏,马勇男,郑涛.基于MD5去重树的网络爬虫的设计与优化[J].计算机应用与软件,2015,32(2):325-329. 被引量：10
4王国君.ASP环境下的WEB数据有序输出算法思考[J].河池学院学报,2007,27(2):12-15.
5张永新.基于Asp的Web数据有序输出算法研究[J].中国科技信息,2007(1):123-124. 被引量：2
6赵志华,李娜.数据有序输出在ASP中的实现与研究[J].陕西师范大学学报（自然科学版）,2007,35(S2):166-168.
7陈猛,楚广琳.数据流分类研究综述[J].科技信息,2012(22):239-240.
8王涛,李舟军,颜跃进,陈火旺.人工智能数据流挖掘分类技术综述[J].中国学术期刊文摘,2008,14(10):8-8.
9曾庆花,王文国.一种改进的模糊关联算法及其在IDS中的应用[J].计算机技术与发展,2007,17(7):236-239. 被引量：3
10陈兵,王立松.基于哈希链表和时间链表的HTTP代理缓存机制的实现[J].南京航空航天大学学报,2002,34(1):50-54. 被引量：4

计算机工程与科学

2008年第8期

浏览历史

内容加载中请稍等...

一种基于哈希链表的高效概念漂移连续属性处理算法被引量：1

参考文献27

同被引文献5

引证文献1

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

一种基于哈希链表的高效概念漂移连续属性处理算法 被引量：1

参考文献27

同被引文献5

引证文献1

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

一种基于哈希链表的高效概念漂移连续属性处理算法被引量：1