摘要
利用某一给定度量作为属性评价指标以及启发式算法的约束条件,是大量属性选择方案的关键.然而,属性相似性评价的缺失与朴素的逐个选择机制,使属性遍历存在冗余,故时间消耗巨大.此外,单一度量限制了属性评价视角,难以挖掘出高学习性能的属性.鉴于此,提出一种属性选择框架,其中:1)利用属性粒度及属性间的知识距离对属性分组,组内属性具有明显差异性且组间属性具有较强区分能力,使属性遍历以组为单位,有效压缩候选属性搜索空间,提升属性选择效率;2)利用提出的受限Pareto最优原则,对属性组进行迭代选取,最终得到期望的属性子集.在12组UCI数据集上,通过注入4种不同比例的属性噪声进行实验,结果表明:相较于8种流行方法,所提出方法得到的属性选择结果,在分类稳定性这一指标上平均提升了5.89%,在分类准确率这一指标上平均提升了12.28%,在时间消耗这一指标上平均降低了59.27%.
The key to numerous attribute selection methods lies in the utilization of a given measure as the attribute evaluation criterion,along with the constraints of heuristic algorithms.However,the absence of attribute similarity evaluation and the simplistic sequential selection mechanism result in redundant attribute traversal,leading to significant time consumption.Additionally,the use of a single measure limits the perspective of attribute evaluation,making it difficult to unearth attributes with high learning performance.In view of this,a framework for attribute selection is proposed,where:1)Attribute grouping is performed based on attribute granularity and knowledge distance between attributes.Within each group,the attributes exhibit significant differences,while between groups,the attributes possess strong discriminative power.This allows attribute traversal to be conducted at the group level,effectively compressing the search space of candidate attributes and improving attribute selection efficiency.2)The proposed restricted Pareto optimality principle is utilized to iteratively select attribute groups,ultimately obtaining the desired subset of attributes.In experiments conducted on 12 UCI datasets by injecting four different levels of attribute noise,the results show that compared to 8 popular methods,the proposed approach yields attribute selection results with an average improvement of 5.89%in classification stability,an average improvement of 12.28%in classification accuracy,and an average reduction of 59.27%in time consumption.
作者
印振宇
王平心
杨习贝
于化龙
钱宇华
YIN Zhen-yu;WANG Ping-xin;YANG Xi-bei;YU Hua-long;QIAN Yu-hua(School of Computer,Jiangsu University of Science and Technology,Zhenjiang 212100,China;School of Science,Jiangsu University of Science and Technology,Zhenjiang 212100,China;School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China)
出处
《控制与决策》
EI
CSCD
北大核心
2024年第9期2959-2968,共10页
Control and Decision
基金
国家自然科学基金项目(62076111)
江苏省研究生实践创新计划项目(SJCX22_1905)。
关键词
属性选择
粒度
启发式算法
启发式信息
邻域粗糙集
PARETO最优
attribute selection
granularity
heuristic algorithm
heuristic information
neighborhood rough set
Pareto optimality