摘要
连续属性值的离散化一直是机器学习领域中殛待解决的关键问题之一 ,他对于提高后继学习算法的运行速度、降低算法的实际空间要求和时间消耗、提高学习结果的聚类能力等都具有极其重要的意义。本文首先分析了基于粗集模型的数据离散化方法的特点和基本思路 ,研究了候选断点重要性的衡量方式 ,在此基础上提出两种新的从候选集合中最终确定离散化断点的启发式算法。这两种算法考虑并体现了粗集理论的基本特点和优点 ,选择的断点都能够保证信息系统的分辨关系 ,并能够取得较理想的离散化结果。
The discretization of real values is always one of the key problems to be solved in the domain of machine learning for its great contribution to speeding up the followed learning algorithms, cutting down the real demand of algorithms on running space and time, and improving the clustering capability of the ultimate learning results. The basic characteristics and framework of discretization approaches based on rough set model are analyzed at first, then the different measurements of the importance of candidate cuts are discussed and researched. Two new heuristic algorithms are put forward to finally select the useful cuts from a candidate set. The selected cuts of the two algorithms will adequately maintain the discernible relation of information systems for their full considering the specialty of rough set, which perfectly embodies the advantages of this theory. Moreover, excellent discretization results may be expected through these heuristic algorithms.
出处
《重庆大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2002年第3期18-21,共4页
Journal of Chongqing University
基金
国家自然科学基金 (6980 3 0 14 )
攀登特别支持费
重庆市科委攻关基金资助