摘要
节点属性的选择是决策树生成过程中的关键环节,以ID3和C4.5为代表的经典决策树算法中,树节点的选择是通过子集样本数计算信息增益或增益比例得到的。但是,对于连续性属性,由于离散化分割导致了子集边界元素在隶属关系上的模糊,使样本计算的方式存在了一定的不合理性,为解决这一问题,采用了模糊集理论并以模糊度的方式取代样本个数参与增益比例的计算,给出了一种获得决策树分类中不确定性尺度的可行途径。
The choosing of node attribute is the pivotal tache during the building process of decision tree.ID3 and C4.5 are the representations of classical decision tree arithmetic,in which tree node is chosen by computing the information gain or gain ratio on the basis of the number of subset.However,due to continuity attribute,dispersed partition result in the faintness of subjection of subset boundary element,which makes the method of sample computing illogical.Adopting fuzzy set theory and using the way of fuzzy gain ratio instead of the way of the number of sample participating in plus property computing,this paper presents one feasible method of uncertainty scale in gaining decision tree classification.
出处
《计算机工程与应用》
CSCD
北大核心
2008年第25期146-148,154,共4页
Computer Engineering and Applications
关键词
决策树
模糊集
模糊增益比例
聚类
decision tree
fuzzy set
fuzzy gain ratio
clustering