摘要
在利用判定树进行分类挖掘时,需事先知道属性的分类。对不具体的或未知的属性分类,利用“高类聚、低耦合”原理对属性进行最优分类。在属性分类的基础上,利用基于信息熵的属性期望信息及对应的信息增益理论选择最佳分类决策属性,并按最佳分类决策属性引出分枝形成判定树。该文对属性的最优分类理论及算法进行了描述,并讨论了选择最佳决策属性构造判定树的算法,结合具体应用实例进行了验证并构造了判定树。
In order to class if y a sample by decision tree,it is necessary to know the classification of attri bute in advance.To the unspecific or unknown attribute classification,you may utilize“High kind gathers,Low coupling”principle to carry on the optimum clas sification to attribute.On the basis of what attribute is classified,based on entropy of the information you can utilize attribute expected information and co rresponding information gain theory to choose the best classified decision attri bute,and according to the best classification decision attribute draw branch,a decision tree is constructed.In this paper,optimum classification theory and algorithm to attribute are described,and the algorithm on how to choose best de cision attribute to construct a decision tree is discussed.At the end,combini ng a concrete example,a decision tree is verified and constructed.
出处
《计算机工程与应用》
CSCD
北大核心
2004年第1期186-189,共4页
Computer Engineering and Applications
关键词
属性
分类
数据挖掘
信息熵
判定树
a ttribute,classification,data mining,information entropy,decision tree