期刊文献+

基于余弦相似度的改进C4.5决策树算法 被引量:16

Improved C4.5 decision tree algorithm based on cosine similarity
下载PDF
导出
摘要 针对传统C4.5算法存在容易产生冗余规则、决策树规模过大、分类速度过慢等问题,提出一种基于余弦相似度的改进C4.5决策树算法。计算每个属性的信息熵和增益率,如果任意属性的任意两个属性值的信息熵之差在一个很小范围内时,计算两个属性值的余弦相似度;合并相似度在阈值范围内的属性值,重新计算合并后属性的信息增益率,依据传统的C4.5算法进行计算。抽取某医院普检数据进行仿真,仿真结果表明,所提算法能够有效降低分裂属性维度,缩减了决策树规模,减少了冗余规则,提高了分类速度。 There are some defects of traditional C4. 5 algorithm including redundant rules, large decision size and slow speed. To solve these problems, an improved C4. 5 decision tree algorithm was proposed based on cosine similarity. Information entropy of each attribute and gain rate were calculated, if any attribute of the information entropy difference of any two attribute value was in a small range, the cosine similarity of two attribute values was calculated. Attribute values with the similarity within the scope of the threshold value were merged and the information gain rate of combined attribute was recalculated based on the tradi- tionll C4. 5 algorithm. The hospitll data generated in geneml inspection were picked up for simulation. Results show that the proposed algorithm can effectively reduce split attribute dimension, the size of the decision tree and redundant rules, while im-prove the classification speed.
出处 《计算机工程与设计》 北大核心 2018年第1期120-125,共6页 Computer Engineering and Design
基金 山东省自然科学基金项目(ZR2014FL019) 山东省高等学校科技计划基金项目(J14LN31)
关键词 数据挖掘 C4.5算法 余弦相似度 决策树 降维 data mining C4.5 algorithm cosine-similarity decision-tree dimensionality reduction
  • 相关文献

参考文献7

二级参考文献51

  • 1张振跃,查宏远.Principal Manifolds and Nonlinear Dimensionality Reduction via Tangent Space Alignment[J].Journal of Shanghai University(English Edition),2004,8(4):406-424. 被引量:73
  • 2莫以为,萧德云.进化粒子滤波算法及其应用[J].控制理论与应用,2005,22(2):269-272. 被引量:41
  • 3胡士强,敬忠良.粒子滤波算法综述[J].控制与决策,2005,20(4):361-365. 被引量:293
  • 4Thomas K L,Susan T D.A solution to Plato′s problem:the latent semantic analysis theory of acquisition,induction,and representation of knowledge[J].Psychological Review,1997,104(2):211-240.
  • 5Tversky A.Features of similarity[J].Psychological Review,1977,84(4):327-352.
  • 6Omiecinski E R.Alternative interest measures for mining associations in databases[J].IEEE Transactions on Knowledge and Data Engineering,2003,15(1):57-69.
  • 7Lee Y K,Kim W Y,Cai Y D,et al.CoMine:efficient mining of correlated patterns[C]// Proceedings of the Third IEEE International Conference on Data Mining.Melbourne,2003:581-584.
  • 8Xiong H,Tan P N,Kumar V.Hyperclique pattern discovery[J].Data Mining and Knowledge Discovery Journal,2006,13(2):219-242.
  • 9Zhao Y,Karypis G.Criterion functions for document clustering:experiments and analysis[J].Machine Learning,2004,55(3):311-331.
  • 10Solskinnsbakk G,Gulla J A.Combining ontological profiles with context in information retrieval[J].Data and Knowledge Engineering,2010,69(3):251-260.

共引文献129

同被引文献144

引证文献16

二级引证文献76

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部