摘要
Chi2系列算法是基于概率统计理论的连续属性离散化重要方法.论文对Chi2相关算法进行了深入分析,指出其中的不足,提出一种新的连续属性离散化方法:Rectified Chi2算法.新算法给出一种新的区间合并依据,能够更合理更有效地对连续属性进行离散化.在此基础上,考虑仅以最大差异为区间合并标准存在不合理性,提出一种基于差异序列为标准的区间合并方法,该方法可以大大提高Chi2系列算法的离散化效果.实验结果证明了上述算法的有效性.
Algorithms of the series of Chi2 algorithm (includes Modified Chi2 algorithm and Extended Chi2 algorithm which is up to data algorithm in this domain) are famous discretization algorithm with the base of probability and statistics. Algorithms of the correlation of Chi2 algorithm are analyzed, and the drawback of the algorithm is pointed. Based on the analysis a new modified algorithm called Rectified Chi2 algorithm is proposed. The new algorithm regards a new merging standard as basis of interval merging and discretes the real value attributes exactly and reasonably. To solve the problem that all the algorithms of the series of Chi2 algorithm only adopt maximal difference as standard of interval merging, a difference sequence method is proposed which is having better performance manifested by experiments than that of the series of Chi2 algorithm.
出处
《小型微型计算机系统》
CSCD
北大核心
2009年第3期524-529,共6页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(60372071)资助
辽宁省教育厅高等学校科学研究基金(2004C031)资助
辽宁师范大学校基金资助