摘要
启发式算法在求解约简的过程中逐步加入重要度最高的属性,但其忽视了数据扰动将会直接引起重要度计算的波动问题,从而造成约简结果的不稳定。鉴于此,提出了一种基于集成属性重要度的启发式算法框架。首先,在原始数据上进行多重采样;然后,在每次循环过程中分别计算各个采样结果上的属性重要度并对这些重要度进行集成;最后,将集成重要度最大的属性加入到约简中去。利用邻域粗糙集方法进行的实验结果表明,基于集成重要度的属性约简算法不仅能够获取更加稳定的约简,而且利用所生成的约简能够得到一致性较高的分类结果。
In the process of computing reduct using a heuristic algorithm,the attribute with the highest importance is gradually added in.However,this approach neglects the fluctuation of important calculations which is directly caused by data perturbation.Notably,such fluctuation may lead to an unstable reduct result.To eliminate such an anomaly,a framework consisting of a heuristic algorithm based on the importance of the ensemble attribute was proposed.In this approach,firstly,multiple sampling is executed for raw data;secondly,in each cycle,the importance of each attribute is computed on the basis of each sampling and the importance indices are integrated;finally,the attribute with the highest importance is added into the reduct.The experimental results obtained by utilizing the neighborhood rough set method show that the new approach not only obtains a more stable reduct,but also attains the classification results with high uniformity.
作者
李京政
杨习贝
窦慧莉
王平心
陈向坚
LI Jingzheng;YANG Xibei;DOU Huili;WANG Pingxin;CHEN Xiangjian(School of Computer,Jiangsu University of Science and Technology,Zhenjiang 212003,China;School of Economics and Management,Nanjing University of Science and Technology,Nanjing 210094,China;School of Mathematics and Physics,Jiangsu University of Science and Technology,Zhenjiang 212003,China)
出处
《智能系统学报》
CSCD
北大核心
2018年第3期414-421,共8页
CAAI Transactions on Intelligent Systems
基金
国家自然科学基金项目(61572242
61503160
61502211)
江苏省高校哲学社会科学基金项目(2015SJD769)
中国博士后科学基金项目(2014M550293)
关键词
属性约简
分类
聚类
数据扰动
集成
启发式算法
邻域粗糙集
稳定性
attribute reduction
classification
clustering
data perturbation
ensemble
heuristic algorithm
neighborhood rough set
stability