摘要
为提高大规模数据集生成树的准确率,提出一种预生成一棵基于这个数据集的决策树,采用广度优先遍历将其划分为满足预定义的限制的数据集,再对各数据集按照一定比例进行随机采样,最后将采样结果整合为目标数据集的数据采样方法。通过对一UCI数据集进行采样,并用现有决策树算法实验证明,该采样方法优于传统随机采样方法,基于该采样方法的生成树准确率有所提高。
To raise the accuracy of decision trees on extensive data sets,proposed a new kind of way to sample on data sets.Pre-generated a decision tree using some fast decision tree algorithms,divide the decision tree into some data sets in predefined limit by BSF manner,then sample on every set in random,integrate all sets into target set.Experiment on an UCI data set show that the ratio of average correct rates is higher than traditional random sample.
出处
《微型机与应用》
2010年第21期5-6,13,共3页
Microcomputer & Its Applications