期刊文献+

基于红黑树的连续属性数据流快速决策树分类算法 被引量:7

Very fast decision tree classification algorithm based on red-black tree for data stream with continuous attributes
下载PDF
导出
摘要 以提高连续属性数据流的分类挖掘效率为目标,设计并实现了一种基于红黑树的连续属性数据流快速决策树分类算法VFDT_RBT。该算法利用红黑树来更有效地处理样本的插入,使得有序插入时的时间复杂度仍为O(nlogn);利用堆栈和红黑树中序遍历有序的特点来降低最佳划分阈值选取过程的时间复杂度;利用hoeffding不等式确定连续属性划分阈值所需的样本数量;在允许连续属性多次出现的原则下选择划分属性建立决策树,提高了算法的分类精度。在多个数据集上的分类实验结果表明:VFDT_RBT比已有的VFDTc具有更低的时间复杂度和更高的分类精度,更适合处理多属性样本。 A decision tree classification algorithm based on red-black tree, called the VFDT_RBT, is designed and implemented. The algorithm uses red-black tree to deal with sample insertions and the complexity of the orderly insertion is 0 (nlogn). Stack and some characters about inorder traversal of Red-Black Tree are used to decrease the processing time for choosing the best split point. Hoeffding inequality is used to determine the number of training samples for obtaining the best split point. The principle of allowing the multiple occurrences of continuous attributes is presented, thus improving the classification accuracy. Experimental results based on different data sets show that VFDT_RBT has lower processing time and higher classification accuracy than VFDTc, and it is more suitable for the multiple attribute examples.
作者 陈煜 李玲娟
出处 《南京邮电大学学报(自然科学版)》 北大核心 2017年第2期86-90,共5页 Journal of Nanjing University of Posts and Telecommunications:Natural Science Edition
基金 国家自然科学基金(61302158 61571238)资助项目
关键词 数据流 红黑树 连续属性 VFDTc 决策树 data streams red-black tree continuous attribute VFDTc decision tree
  • 相关文献

参考文献3

二级参考文献49

  • 1唐自立.一种新的删除AVL树的结点的算法[J].计算机应用与软件,2005,22(4):107-109. 被引量:4
  • 2杨宜东,孙志挥,张净.基于核密度估计的分布数据流离群点检测[J].计算机研究与发展,2005,42(9):1498-1504. 被引量:8
  • 3钱江波,徐宏炳,董逸生,王永利,刘学军,杨雪梅.基于最小生成树的数据流窗口连接优化算法[J].计算机研究与发展,2007,44(6):1000-1007. 被引量:3
  • 4Cormen T. H. Leiserson C. E. Rivest R L, Stein C, Introduction to Algorithms[ M ] ,2nd ed, Cambridge, MA : MIT Press ,2001.
  • 5Weiss M. A, Data Structures and Algorithm Analysis in C ++ [ M] ,2nd ed, Reading, MA : Addlson-Wesley Longman, 1999.
  • 6Babcock B,Babu S,Datar M,Motawani R,Widom J.Models and issues in data stream systems//Proceedings of the PODS.2002
  • 7Jin R,Agrawal G.Efficient decision tree construction on streaming data//Proceedings of the ACM SIGKDD 2003.2003:571-576
  • 8Last M.Online classification of nonstationary data streams.Intelligent Data Analysis,2002,6(2):129-147
  • 9Muthukrishnan S.Data streams:Algorithms and applications//Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms,2003
  • 10Xie Q H.An efficient approach for rmmng concept-drifting data streams[M.S.dissertation].National University of Tainnan,Tainan,China,2004

共引文献60

同被引文献39

引证文献7

二级引证文献126

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部