期刊文献+

一种可用于分类型属性数据的多变量决策树算法 被引量:15

An Applicable Multivariate Decision Tree Algorithm for Categorical Attribute Data
下载PDF
导出
摘要 针对绝大部分多变量决策树只能联合数值型属性,而不能直接为带有分类型属性数据集进行分类的问题,提出一种可联合多种类型属性的多变量决策树算法(CMDT).该算法通过统计各个分类型属性的属性值在各个类别或各个簇中的频率分布,来定义样本集合在分类型属性上的中心,以及样本到中心的距离.然后,使用加权k-means算法划分决策树中的非终端结点.使用这种结点划分方法构建的决策树可用于数值型数据、分类型数据以及混合型数据.实验结果表明,该算法建立的分类模型在各种类型的数据集上均获得比经典决策树算法更好的泛化正确率和更简洁的树结构. Most multivariate decision trees are applicable for only the numerical data.To solve the classification problem on categorical attribute data,an applicable multivariate decision tree(CMDT)algorithm is proposed.The center of the sample set on the categorical attributes,and the distance between the samples and the centers are defined with statistics for frequency distribution of categorical attribute values in each category or each cluster.Weighted k-means algorithm is utilized to split the nodes in the decision tree.The proposed multivariate decision tree is applicable for numerical data,categorical data,and mixed data.Experiment results show that the classification model based on the proposed algorithm can get more concise tree construction and higher generalization accuracy than that based on the classic decision tree algorithms with different kinds of data.
作者 刘振宇 宋晓莹 LIU Zhen-yu;SONG Xiao-ying(Software Center,Northeastern University,Shenyang 110819,China;Key Laboratory of Network Security and Computing Technology,Dalian Neusoft University of Information,Dalian 116023,China)
出处 《东北大学学报(自然科学版)》 EI CAS CSCD 北大核心 2020年第11期1521-1527,共7页 Journal of Northeastern University(Natural Science)
基金 国家自然科学基金资助项目(61772101,61602075) 辽宁省重点研发计划项目(2018).
关键词 决策树 分类型属性 多变量决策树 结点划分 K-均值 decision tree categorical attribute multivariate decision tree node split k-means
  • 相关文献

参考文献2

二级参考文献11

  • 1J MacQueen.Some Methods for Classification and Analysis of Multivariate Observations[A].Proc 5th Berkeley Symp Mathematics Statist and Probaility[C].1967.281-297.
  • 2H Ralambondrainy.A Conceptual Version of the k-Means Algorithm[J].Pattern Recognition Letters,1995,16(11):1147-1157.
  • 3Zhexue Huang.A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining[A].Proc SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery[C].1997.
  • 4Zhexue Huang.Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values[J].Data Mining and Knowledge Discovery,1998,2(3):283-304.
  • 5C J Merz,P Merphy.UCI Repository of Machine Learning Databases[EB/OL].http://www.ics.uci.edu/ mlearn/ MLRRepository.html,2004-09.
  • 6MIT Lincoln Labs.1999 DARPA Intrusion Detection Evaluation[EB/OL].http://www.ll.mit.edu/IST/ideval/index.html,1999-12.
  • 7G W Milligan,M C Cooper.An Examination of Procedures for Determining the Number of Clusters in a Data Set[J].Psychometrika,1987,50(2):159 -179.
  • 8M Meila,D Heckerman.An Experimental Comparison of Several Clustering and Initialization Methods[A].Proc of the 14th Conf on Uncertainty in Artificial Intelligence[C].1998.386-395.
  • 9C Fraley,A E Raftery.How Many Clusters? Which Clustering Method? Answers via Model-Based Cluster Analysis[J].Computer Journal,1998,41(8):578-588.
  • 10谢娟英,谢维信.基于特征子集区分度与支持向量机的特征选择算法[J].计算机学报,2014,37(8):1704-1718. 被引量:64

共引文献15

同被引文献175

引证文献15

二级引证文献52

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部