期刊文献+

具有全局聚类的多属性离散化算法 被引量:3

Synchronized Continuous Attributes Discretization Based on Ameva
下载PDF
导出
摘要 为了减少连续属性离散化后有用信息的丢失和信息系统总的断点数量,提出了一种具有全局聚类效果的多属性离散化算法.算法根据各属性预插入断点对信息系统近似分类质量的影响,来确定要插入断点的属性,从全局属性范围选择最佳断点.根据Ameva统计量来判断属性中最佳断点的位置,并以保证决策表的近似分类质量作为算法的终止条件.实验采用多组机器学习数据对算法的性能进行了检验,并与几种经典算法做了对比.实验结果表明,用新的离散化算法获得的结果所建的C45决策树分类模型,具有较好的分类精度和较少的节点数量. To avoid information loss and cut points decrease after discretization of continuous attributes,a synchronized continuous attribute discretization algorithm with good global clustering effect for selecting cut points from all conditions attributes is presented.This algorithm decides which continuous attribute should be inserted according to the cut point from all attributes based on the influence of the inserted cut point.The influence is evaluated by information system approximation classification quality.Then cut point is selected from the candidate points in the attribute according to Ameva statistics,and the level of indiscernibility relation is chosen as the stopping condition of the algorithm.By UCI machine learning data sets a comparison with several classic discretization algorithms shows that the C45 classification model based on the proposed algorithm is of good classification accuracy and needs less nodes.
出处 《西安交通大学学报》 EI CAS CSCD 北大核心 2011年第9期1-5,共5页 Journal of Xi'an Jiaotong University
基金 国家自然科学基金资助项目(51105296) 机械制造系统工程国家重点实验室开放课题资助项目 中央高校基本科研业务费专项资金资助项目
关键词 统计量 连续属性 离散化 statistics continuous attributes discretization
  • 相关文献

参考文献14

  • 1XU E, SHAO Liangshan. A new discretization approach of continuous attributes [C]//Proceedings of the 2010 Asia-Pacific Conference on Wearable Computing Systems: APWC 2010. Piscataway, NJ, USA: IEEE Computer Society, 2010: 136-138.
  • 2MIZIANTY M J,KURGAN L A, OGILA M R. Discretization as the enabling technique for the Naive Bayes and semi-Naive Bayes-based classification [J]. Knowledge Engineering Review, 2010,25 (4): 421-449.
  • 3KERBER R. ChiMerge: discretization of numeric attributes [C] // Proceedings of Ninth National Conference on Artificial Intelligence. Menlo Park, CA, USA: AAAI Press, 1992: 123-128.
  • 4LIU Huan,SETIONO R. Feature selection via discretization [J]. IEEE Transactions on Knowledge and Data Engineering, 1997,9(4) : 642-645.
  • 5TAY E H, SHEN L. A modified chi2 algorithm for discretization [J]. IEEE Transactions on Knowledge and Data Engineering, 2002,14(3): 666-670.
  • 6CBAO T S, JYH H H. An extended chi2 algorithm for discretization of real value attributes [J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17 (3):437-441.
  • 7MARCE B. Khiops: a statistical discretization method of continuous attributes [J]. Machine Learning, 2004, 55(1):53-69.
  • 8GONZALEZ A L, CUBEROS F J, VELASCO F. Ameva:an autonomous discretization algorithm [J]. Expert Systems with Applications, 2008,36(3) : 5327-5332.
  • 9刘静,王国胤,胡峰.基于断点辨别力的粗糙集离散化算法[J].重庆邮电大学学报(自然科学版),2010,22(2):257-261. 被引量:2
  • 10MERZ C J, MURPHY P M. UCI repository of machine learning database [EB/OL]. [2003 - 08 - 16]. http:// www. ics. uci. edu/-mlearn/MLRepository. html.

二级参考文献10

共引文献1

同被引文献39

引证文献3

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部