摘要
针对决策树分类方法的计算效率进行深入研究,根据信息增益计算的特点,引入了上凸函数的概念,用于提高决策树分类过程中信息增益的计算效率。利用我们所提出的"一致性定理"和"特殊一致性定理",从理论上证明了利用上凸函数对信息增益计算进行改进后,构造的决策树与原决策树具有相同的分类准确率。同时我们通过对大数据集的实验,发现在相同规模的数据集下,改进后的决策树算法比原算法有更高的计算效率,并且这种计算效率的提高有随着数据集规模的增加而增加的趋势。
In this paper,we research deeply the theory of decision trees induction.According to the character of expected information and the quality of convex function,we propose a new algorithm to raise the efficiency of calculating expected information in the process of inducing the decision trees.By using the theory of consistency and special consistency,we also prove that the accuracy of decision trees constructed by the improved algorithm is equal to the one of ID3 algorithm.At the same time,through the experiment of testing the large datasets,we find that the new algorithm has higher calculative efficiency than the old one in the same datasets.Moreover with the larger scale of datasets,the calculation of expected information has more rapid efficiency.
出处
《中国管理科学》
CSSCI
2004年第4期144-148,共5页
Chinese Journal of Management Science
关键词
决策树
ID3算法
上凸函数
信息熵
decision tress
ID3 Algorithm
convex function
expected information