摘要
在CAIM算法中,离散判别式仅考虑了区间中最多的类与属性间的依赖度,使离散化过度而导致结果不精确。基于此,提出对CAIM的改进算法,该算法考虑到按属性重要性从小到大顺序进行离散,同时根据粗糙集理论提出条件属性可分辨率概念,与近似精度同时控制信息表最终的离散程度,有效解决了离散化过度问题。实验通过C4.5和支持向量机分别对离散化后的数据进行识别和分类预测,结果证明了该算法的有效性。
In Class-Attribute Interdependency Maximization(CAIM) algorithm, discretization criterion only accounts for the trend of maximizing the number of values belonging to a leading class within each interval. The disadvantage makes CAIM generate irrational discrete results and further leads to the decrease of predictive accuracy of a classifier. This paper proposes a modified algorithm of CALM. With the algorithm, the importance of attributes is adopted in discretization process, and a concept of attribute discernibility rate is proposed based on rough set. Both attribute discernibility rate and approximate quality are used for discretization intervals, which effectively resolve the problem of over-discretization. By using C4.5 and SVM, experiments are performed respectively with the results of discreted data, which show that the presented algorithm is effective.
出处
《计算机工程》
CAS
CSCD
北大核心
2010年第4期77-78,81,共3页
Computer Engineering
基金
国家自然科学基金资助项目(60372071)
中国科学院自动化研究所复杂系统与智能科学重点实验室开放课题基金资助项目(20070101)
辽宁省教育厅高等学校科学研究基金资助项目(2008344)
大连市科技局科技计划基金资助项目(2007A10GX117)
关键词
连续属性离散化
粗糙集
属性可分辨率
discretization of continuous attributes
rough set
attribute discernibility rate