摘要
针对粗糙集中连续属性的离散化问题,提出了一种基于断点选择的离散化方法.首先对条件属性进行重要性排序,选用有效的启发式规则作为获取近似最优断点的依据;然后以信息熵和决策表的相容度作为约束条件,生成离散化数据.最后采用UCI数据对此算法的性能进行了检验,并与其他算法做了对比实验.实验结果表明此算法是有效的,而且当属性值的出现频率和样本数较多时仍有很高的计算效率.
A near-optimal method of attributes discretization is proposed based on the cut point selection.Firstly,the condition attributes are ordered for their importance and the effective heuristic rules are getted according to the near-optimal cut points.Then the discretization data are generated which is constrained by the information entropy and consistance of decision table.At the last the method is tested using the UCI data sets,meanwhile,the result of comparison with other method is showen.From the experiment,a conclusion can be drown that this method is effective,especially,it is an effective algogithm on the high frequence of the attribute value and large simple data.
出处
《武汉理工大学学报(交通科学与工程版)》
2011年第2期297-300,共4页
Journal of Wuhan University of Technology(Transportation Science & Engineering)
基金
国家自然科学基金项目资助(批准号:70471031)
关键词
粗糙集
信息熵
启发式规则
连续属性
离散化
rough set
information entropy
heuristic rules
continues attributes
discretization