期刊文献+

对Chi2系列算法的改进方法 被引量:2

Modification to Algorithms of the Series of Chi2 Algorithm
下载PDF
导出
摘要 Chi2系列算法是基于概率统计理论的连续属性离散化重要方法.论文对Chi2相关算法进行了深入分析,指出其中的不足,提出一种新的连续属性离散化方法:Rectified Chi2算法.新算法给出一种新的区间合并依据,能够更合理更有效地对连续属性进行离散化.在此基础上,考虑仅以最大差异为区间合并标准存在不合理性,提出一种基于差异序列为标准的区间合并方法,该方法可以大大提高Chi2系列算法的离散化效果.实验结果证明了上述算法的有效性. Algorithms of the series of Chi2 algorithm (includes Modified Chi2 algorithm and Extended Chi2 algorithm which is up to data algorithm in this domain) are famous discretization algorithm with the base of probability and statistics. Algorithms of the correlation of Chi2 algorithm are analyzed, and the drawback of the algorithm is pointed. Based on the analysis a new modified algorithm called Rectified Chi2 algorithm is proposed. The new algorithm regards a new merging standard as basis of interval merging and discretes the real value attributes exactly and reasonably. To solve the problem that all the algorithms of the series of Chi2 algorithm only adopt maximal difference as standard of interval merging, a difference sequence method is proposed which is having better performance manifested by experiments than that of the series of Chi2 algorithm.
出处 《小型微型计算机系统》 CSCD 北大核心 2009年第3期524-529,共6页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(60372071)资助 辽宁省教育厅高等学校科学研究基金(2004C031)资助 辽宁师范大学校基金资助
关键词 连续属性离散化 CHI2算法 粗糙集 差异序列 discretization algorithm Chi2 algorithm rough sets difference sequence
  • 相关文献

参考文献11

  • 1Kerber R. ChiMerge: diseretization of numeric attributes[C]. Proceedings Ninth National Conference on Artificial Intelligence, AAAI Press, 19920123-128.
  • 2iu H o Setiono R. Feature selection via discretization[J]. 1EEE Transactions on Knowledge and Data Engineering. 1997. 9(4), 642-645.
  • 3Tay E H, Shen L. A modified ehi2 algorithm for discretization [J]. IEEE Transactions on Knowledge and Data Engineering. 2002, 14(3): 666-670.
  • 4Chao-Ton Su, Jyh-Hwa Hsu. An extended chi2 algorithm for discretization of real value attrihutes[J]. IEEE Transactions on Knowledge and Data Engineering. 2005, 17(3):437-441.
  • 5Liu X, Wang H. A discretization algorithm based on a heterogeneity criterion[J]. IEEE Transactions on Knowledge and Da ta Engineering, 2005, 17(9).. 1166-1173.
  • 6Hettich S, Bay S D. The UCI KDD archive[EB/OL], http:// kdd. its. uci. edu/, 1999.
  • 7Hsu C W, Lin C J. A comparison of methods for muhiclass support vector machines[M]. IEEE Transactions on Neural Networks. 2002, 13(2): 415-425,
  • 8Platt J C, Cristianini N, Shawe-Taylor J. Large margin DAG's for multielass classification[M]. Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press 2000, 12: 547-553.
  • 9Bian Guo-rui, Wu Li-de, Li Xian-ping, et al. Probability theory (Volume 2, Mathematical statistics)[M]. Beijing: People' s Education Press, 1979.
  • 10LiGuo-zheng, WangMeng, ZengHua-jun. An introduction to support vector machines and other kernel-based learning methods [M]. Beijing, publishing House of Electronics Industry, 2000.

同被引文献14

  • 1刘云霞.数据挖掘中基于似然比假设检验的连续属性离散化方法[J].统计与决策,2007,23(8):11-13. 被引量:3
  • 2王熙照,杨晨晓.分支合并对决策树归纳学习的影响[J].计算机学报,2007,30(8):1251-1258. 被引量:17
  • 3Matthew S Sullivan, Martin J Jones, David C Lee, et al. A comparison of predictive methods in extinction risk studies: Contrasts and decision trees [ J ]. Biodiversity and Conservation,2006,15 ( 6 ) : 1977 - 1991.
  • 4Nguyen Sh, Nguyen H S. Some efficient algorithms for rough set meth- ods[ C] //Proc of Conference on Information Processing and Manage- ment of Uncertainty in Knowledge-based Systems,1996:1451 -1456.
  • 5Nguyenh S, Skowron A. Quantization of real values attributes, rough set and Boolean reasoning approaches[ C] //Proc of the 2nd Joint Annual Conference on Information Science, Wrightsville Beach:[ s. n. ], 1995: 34 - 37.
  • 6Dai Jian hua, Li Yuan xiang. Study on diseretization based on rough set theory [ C ] //Proe of the 1st International Conference on Machine Learning and Cybernetics. 2002:1371 -1373.
  • 7Chen Caiyun, Li Zhiguo, Qiao Shengyong, et al. Study on discretization in rough set based on genetic algorithm[ C] //Proc of the 2nd Interna- tional Conference on Machine Learning and Cybernetics. 2003:1430 - 1434.
  • 8Huang Jinjie, Li Shiyong. A GA-based approach to rough data model [ C ] //Proc of the 5thWorld Congress on Intelligent Control and Auto- mation. 2004 : 1880 - 1884.
  • 9Fayyad U M, Irani K B. On the handling of continuous-valued attributes in decision tree generation [ J ]. Machine Learning, 1992,8 ( 1 ) : 87 - 102.
  • 10苗夺谦,李道国.粗糙集理论、算法与应用[M].北京清华大学出版社,2008.

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部