摘要
探讨了贪心及其改进算法、基于属性重要性、基于信息熵和基于聚类四类连续属性离散化算法,并通过实验验证这四类算法的离散化效果。实验结果表明,数据集离散化的效果不仅取决于使用算法,而且与数据集连续属性的分布和决策数据值的分类也有密切关系。
This paper disscussed four kinds of discretization methods which include greedy and some improved algorithms, significance of attributes, entropy of information and clustering-based algorithms. And compard the quality of the four categories of algorithms. The last experiments indicate that the quality of discretization of dataset not only lies on the algorithm, but also is closely related to distributing of continuous attributes and data of decision.
出处
《计算机应用研究》
CSCD
北大核心
2007年第9期28-30,33,共4页
Application Research of Computers
基金
国家自然科学基金(70471046)
教育部博士点基金(20040359004)
关键词
离散化
贪心算法
属性重要性
信息熵
聚类
discretization
greedy algorithm
significance of attributes
entropy of information
clustering