摘要
提出一种改进的SLIQ决策树分类算法,克服原有SLIQ算法需要大量计算决策树每个节点的吉尼指数(GINIIndex)的缺点。一是给出数据分布密度的基本概念,并在GINI指标的基础上利用数据分布密度差改进SLIQ;二是将SLIQ算法应用到综合评价中去。实例结果表明,算法改进后,寻找最佳分裂方案的GINI指标的个数大大减少,缩减计算量,降低排序成本和寻找最佳分裂点的代价,简化决策树的规模。
Proposes an improved SLIQ decision tree classification algorithm, overcomes the shortcomings of the original SLIQ algorithm which needs to calculate numerous GINI indexes of each decision tree node. Introduces the concept of the Density of Data Distribution (D3), and improves the SLIQ algorithm with GINI index based on the ED3. Then adopts the new SLIQ in synthetic evaluation. The result in the example demonstrates that the number of the GINI index is reduced in the improved algorithm while searching the optimal split scheme, and cuts down the cost of sort and the optimal split point, simplifies the size of decision tree.
出处
《现代计算机》
2009年第10期54-56,83,共4页
Modern Computer
关键词
SHQ
数据分布密度
评价
SLIQ (Supervised Learning In Quest)
Density of Data Distribution
Evaluation