摘要
目前基于Rough集的离散化算法很难做到高效率和高识别率兼顾,针对粗糙集给出了基于逐级均值聚类的信息熵的离散化算法。首先使用改进的逐级均值聚类算法分别对单个属性的候选断点按其信息熵值进行聚类分析,生成新的规模更小的候选断点集,然后用基于信息熵的离散化算法完成断点的选取并对连续值属性进行离散化。实验结果表明,该方法在识别率相当的情况下比传统的离散化方法的时间代价更低。
The recent discrete algorithms are very difficult to achieve high efficiency and high recognition rate of both. This paper proposed an information entropy discretization algorithm based on ranking means clustering. Firstly,used ranking means clustering method for analyzing information entropy value of each candidate cuts,and generated a new candidate cuts set. Secondly,used information entropy method for completing the selection of cuts for the discretization of continuous attributes values. Finally,simulation experiment results show that the method has lower time complexity than traditional methods.
出处
《计算机应用研究》
CSCD
北大核心
2010年第9期3368-3371,共4页
Application Research of Computers
关键词
粗糙集
离散化
连续值属性
逐级均值聚类
信息熵
rough set
discretization
continuous attribute values
ranking means clustering
information entropy