摘要
随着数据量的爆炸式增长,传统属性约简算法在处理海量数据时效率低下,评价指标单一且容易陷入局部最优,导致属性子集质量不高.为解决这些问题,本文设计一种适用于大数据场景的高效属性约简算法.首先,本文引入了粗糙超立方体方法的3个评价指标用于量化属性之间的关系,对所有待选属性进行综合评估,通过综合考虑这些评价指标得到更为紧凑、更具辨别力的属性子集.其次,本文针对传统启发式搜索策略效率低、容易陷入局部最优的缺点,对二进制灰狼算法进行改进,引入信息素矩阵来引导搜索过程,从而增强算法的搜索能力.信息素矩阵能够提供属性子集的全局信息,帮助算法更好地搜索解空间,从而避免陷入局部最优解.实验表明,本文算法可有效扩展到大规模数据集,在不同数据集的分类任务下取得了显著的性能提升,展现出较强的适应性.
With the explosive growth of data volume,traditional attribute reduction algorithms are inefficient when processing massive data,and the evaluation index is single and easily falls into local optimum,resulting in low-quality attribute subsets.To solve these problems,this paper designs an efficient attribute reduction algorithm suitable for big data scenarios.First,this paper introduces three evaluation indicators of the rough hypercuboid method to quantify the relationship between attributes and conduct a comprehensive evaluation of all the attributes to be selected.A more compact and discriminative subset of attributes can be obtained by comprehensively considering these evaluation metrics.Secondly,aiming at the disadvantages of the traditional heuristic search strategy,which is low efficiency and easy to fall into local optimum,this paper improves the binary grey wolf algorithm and introduces a pheromone matrix to guide the search process,so as to enhance the searchability of the algorithm.The pheromone matrix can provide global information on the subset of attributes,which helps the algorithm to better search the solution space,thus avoiding falling into local optimum solutions.Experiments show that the algorithm in this paper can be effectively extended to large-scale datasets,and achieves significant performance improvement under the classification tasks of different datasets,demonstrating strong adaptability.
作者
姚泓丞
丁卫平
鞠恒荣
黄嘉爽
姜舒
陈悦鹏
YAO Hongcheng;DING Weiping;JU Hengrong;HUANG Jiashuang;JIANG Shu;CHEN Yuepeng(School of Information Science and Technology,Nantong University,Nantong 226019,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2024年第12期2898-2907,共10页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61976120,62006128,62102199)资助
江苏省自然科学基金项目(BK20231337)资助
江苏省双创博士计划项目((2020)30986)资助
江苏省研究生科研与实践创新计划项目(KYCX23_3393)资助
江苏省高等学校自然科学研究重大项目(21KJA510004)资助
江苏省高等学校自然科学研究面上项目(23KJB520031)资助
南通市科技局基础科学研究项目(JC2021122)资助
中国博士后科学基金项目(2022M711716)资助。