期刊文献+

基于改进SMOTE的制造过程不平衡数据分类策略 被引量:2

Classification Strategy of Imbalanced Data in Manufacturing Process Based on Improved SMOTE
下载PDF
导出
摘要 不平衡数据分析是智能制造的关键技术之一,其分类问题已成为机器学习和数据挖掘的研究热点。针对目前不平衡数据过采样策略中人工合成数据边缘化且需要降噪处理的问题,提出一种基于改进SMOTE(synthetic minority oversampling technique)和局部离群因子(local outlier factor,LOF)的过采样算法。首先对整个数据集进行K-means聚类,筛选出高可靠性样本进行改进SMOTE算法过采样,然后采用LOF算法删除误差大的人工合成样本。在4个UCI不平衡数据集上的实验结果表明,该方法对不平衡数据中少数类的分类能力更强,有效地克服了数据边缘化问题,将算法应用于磷酸生产中的不平衡数据,实现了该不平衡数据的准确分类。 Imbalanced data analysis is one of the key technologies of intelligent manufacturing,and its classification prob-lem has become a research hotspot in machine learning and data mining.Aiming at the problem of artificial synthetic data marginalization and noise reduction in the current imbalanced data oversampling strategy,this paper proposes an over-sampling algorithm based on improved SMOTE(synthetic minority oversampling technique)and LOF(local outlier factor).Firstly,perform K-means clustering on the entire data set,select high-reliability samples for oversampling with the improved SMOTE algorithm,and finally use LOF algorithm to delete artificially synthesized samples with large errors.The experi-mental results on 4 UCI imbalanced data sets show that the method is effective.The classification ability of minority class in imbalanced data is stronger,which effectively overcomes the problem of data marginalization.The algorithm is applied to imbalanced data in phosphoric acid production,and the accurate classification of imbalanced data in phosphoric acid production is realized.
作者 黎旭 陈家兑 吴永明 宗文泽 LI Xu;CHEN Jiadui;WU Yongming;ZONG Wenze(Key Laboratory of Advanced Manufacturing Technology of Ministry of Education,Guizhou University,Guiyang 550025,China;College of Mechanical Engineering,Guizhou University,Guiyang 550025,China;State Key Laboratory of Public Big Data,Guizhou University,Guiyang 550025,China)
出处 《计算机工程与应用》 CSCD 北大核心 2022年第16期284-291,共8页 Computer Engineering and Applications
基金 贵州省科技支撑计划项目((2017)2029,[2021]一般439) 贵州省科技计划项目(黔科合平台—JXCX[2021]001)。
关键词 不平衡数据 过采样 局部离群因子 聚类 合成少数过采样技术(SMOTE) imbalanced data over-sampling local outlier factor clustering synthetic minority oversampling technique(SMOTE)
  • 相关文献

参考文献3

二级参考文献37

共引文献161

同被引文献35

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部