摘要
连续属性离散化一直是机器学习领域中亟待解决的关键问题之一。提出一种基于断点重要性的离散化算法。首先给出粗糙集理论的几个基本概念:决策表、不可分辨关系、信息熵和条件熵,然后对离散化问题进行介绍,给出断点分类的条件熵定义,在此基础上给出了断点选择的粗糙集连续属性离散化算法。仿真结果表明,算法的综合性能优越于文献报道的同类算法。
The dicretization of continuous attributes is always one of key problems to be solved in the domain of machine learning. In this paper a discretization algorithm based on importance of cut point. Firstly, the paper gives some concept of rough set theory:decision table,indiscernible relation,information entropy and condition entropy. And then discusses the problem of discretization, profers a define of condition entropy of cau point. On the basis of that,a discretization algorithm of continue attributes in rough set for selecting cut points is illustrated. Simulation results demonstrate that the comprehensive of the algorithm is better than those of analogous algorithm resported in literature.
出处
《现代电子技术》
2007年第2期77-79,共3页
Modern Electronics Technique
关键词
粗糙集
离散化
断点
条件熵
rough set
discretization
cut point
condition entropy