期刊文献+

融合拟单层覆盖粗集的集值数据平衡方法研究 被引量:3

Study on Set-Valued Data Balancing Method by Semi-Monolayer Covering Rough Set
下载PDF
导出
摘要 如今不平衡数据存在生活中各个领域,如何有效地对其分类已经成为研究的热点。传统的过采样与欠采样方法虽然能保证数据的平衡性,但无法克服因数据分布和噪声对数据的分类造成的影响。为了降低数据分布与噪声在集值信息系统中对不平衡数据分类的影响,提出了一种基于拟单层覆盖粗集的过采样与欠采样相结合的模型。通过拟单层覆盖粗集DA0与DE0下近似将数据主要划分为两个部分,将属于下近似集的部分用BorderlineSMOTE进行过采样,将不属于下近似集的部分用ClusterCentroids进行欠采样,最终将二者合并即为最终数据集。拟单层覆盖粗集是适用于集值信息系统的高近似质量、快速计算的模型,高近似质量可以使其保留尽可能多的可靠数据来保证模型的泛化能力。通过混合处理方式,不仅能够降低噪声数据对BorderlineSMOTE的影响,还能通过ClusterCentroids极大程度地保留被过滤数据的信息完整性。通过相关对比实验,采用ExtraTree、DecisionTree、FGCNN等方法,验证了该模型的有效性。 Nowadays,imbalanced data exist in all areas of life,and how to effectively classify it has become a hot topic of studies.Traditional methods of over-sampling and under-sampling ensure balanced data,but cannot overcome the effects on the classification of the data due to data distribution and noise.To reduce the influence of data distribution and noise on the classification of imbalanced data in set-valued information systems,a new method combining oversampling and under-sampling based on semi-monolayer covering rough set is proposed.The data are divided into two main parts by applying semi-monolayer covering rough set DA0 and DE0 lower approximation,the part be-longing to the lower ap-proximation set is oversampled by BorderlineSMOTE,the part not belonging to the lower approximation set is under-sam-pled by ClusterCentroids,and finally,the two are combined to the final data set.Semi-monolayer covering rough set is a high approximation quality,a fast computational model which suitable for set-valued information systems.The high ap-proximation quality allows it to retain as much reliable data as possible to ensure the generalization capability of the mod-el.The hybrid approach not only reduces the impact of noisy data on BorderlineSMOTE but also preserves the informa-tion integrity of the filtered-out data to a great extent through ClusterCentroids.Finally,the effectiveness of the model is verified through relevant comparative experiments using ExtraTree,DecisionTree and FGCNN.
作者 吴正江 杨天 郑爱玲 梅秋雨 张亚宁 WU Zhengjiang;YANG Tian;ZHENG Ailing;MEI Qiuyu;ZHANG Yaning(School of Computer Science and Technology,Henan Polytechnic University,Jiaozuo,Henan 454003,China)
出处 《计算机工程与应用》 CSCD 北大核心 2022年第19期166-173,共8页 Computer Engineering and Applications
基金 国家自然科学基金(61972134,11601129)。
关键词 拟单层覆盖粗集 不平衡数据 近似集 混合处理 过采样 欠采样 semi-monolayer covering rough set imbalanced data approximation set hybrid approach over-sampling under-sampling
  • 相关文献

参考文献8

二级参考文献52

  • 1林舒杨,李翠华,江弋,林琛,邹权.不平衡数据的降采样方法研究[J].计算机研究与发展,2011,48(S3):47-53. 被引量:31
  • 2蒋盛益,谢照青,余雯.基于代价敏感的朴素贝叶斯不平衡数据分类研究[J].计算机研究与发展,2011,48(S1):387-390. 被引量:21
  • 3Pawlak Z.Rough Sets[J].International Journal of Computer and Information Sciences,1982,11:341-356.
  • 4Pawlak Z.Rough sets:theoretical aspects of reasoning about data[M].London:Kluwer Academic Publishers,1991.
  • 5Pawlak Z.Rough set theory and its applications in data analysis[J].International Journal of Cybernetics Systems,1998,29:661-685.
  • 6Skowron A,Rauszer C.The discernibility matrices and functions in information systems[Z].In:R.Slowinski (Ed.),Intelligent Decision Support:Handbook of Applications and Advances of Rough Sets Theory.Kluwer Academic Publisher,Dordrecht,1992:331-362.
  • 7Skowron A.Extracting laws from decision tables:a ro ugh set[J].Computational Intelligence,1995,110:371-388.
  • 8Hu Xiao Hua,Cercone N.Learning in relational databases:a rough set approach[J].Computational Intelligence,1995,11(2):323-337.
  • 9Kryszkiewicz M.Rough Set approach to incomplete Information Systems[J].Information Sciences,1998,112:39-49.
  • 10Kryszkiewicz M.Rules in incomplete information systems[J].Information Sciences,1999,113:271-292.

共引文献103

同被引文献22

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部