摘要
传统支持向量机(SVM)对不平衡数据进行二分类时,存在分类边界容易偏移的问题。目前,对于不平衡数据问题主要从数据集和算法两方面来解决。提出了一种基于数据集方法是采用ADASYN和SMOTE算法来联合生成小类样本点。上述方法是根据K近邻算法计算小类样本点和大类样本点数目,对小样本点进行分类后分别采用ADASYN和SMOTE算法进行小类样本点合成。最后实验对算法验证,结果采用ROC曲线来比较单独采用SMOTE或者ADASYN算法合成小类样本点,文中介绍的算法具有最高AUC值,由此可见提出的算法可以提高不平衡数据分类的有效性。
When the traditional support vector machine(SVM)classifies the unbalanced data,there is a problem that the classification boundary is easily offset.At present,the problem of unbalanced data is mainly solved from two aspects of data sets and algorithms.This paper proposes a data set based method that uses ADASYN and SMOTE algorithms to jointly generate small class sample points.The method calculated the number of small sample points and large sample points according to the nearest neighbor algorithm,and classified the small sample points and then used the ADASYN and SMOTE algorithms to perform small sample point synthesis.Finally,the experiment verifiesd the algorithm.The ROC curve was used to compare the SMOTE or ADASYN algorithm to synthesize small sample points.The algorithm introduced in this paper has the highest AUC value.The proposed algorithm can improve the classifica?tion of unbalanced data.
作者
蒋华
江日辰
王鑫
王慧娇
JIANG Hua;JIANG Ri-chen;WANG Xin;WANG Hui-jiao(School of Computer and Information Security,Guilin University of Electronic Technology,Guilin Guangxi 541000,China)
出处
《计算机仿真》
北大核心
2020年第3期254-258,420,共6页
Computer Simulation
基金
2016广西高校中青年教师基础能力提升项目(ky2016YB150)
桂林电子科技大学研究生教育创新计划项目(2017YJCX48)。