摘要
针对不平衡数据集数据分布不均匀及边界模糊的特点,提出基于近邻密度改进的SVM(NDSVM)不平衡数据集分类算法.该算法先计算多数类内每个样本的近邻密度值,然后依据该密度值选出多数类中位于边界区域、靠近边界区域的与少数类数目相等的样本分别与少数类完成SVM初始分类,最后用所得的支持向量机和剩余的多数类样本完成初始分类器迭代优化.人工数据集和UCI数据集的实验结果表明,与WSVM、ALSMOTE-SVM和基本SVM算法相比,本文算法分类效果良好,能有效改进SVM算法在分布不均匀及边界模糊数据集上的分类性能.
Aimed at the data of uneven distribution and indistinct boundary in imbalanced dataset, imbalanced dataset classification algorithm based on neighbor density support vector machine(NDSVM)is proposed. In this algorithm, neighbor density value of each sample in the majority is solved firstly. According to the density, the data which on the majority class border or close to the border is equal to the minority samples in quantity, which are selected and the minority class complete SVM initial classification. Then the resulting support vector machine and residual data in the majority class optimize the initial classifier. The simulation results of experiments on the manual and UCI dataset show that compared with WSVM, ALSMOTE-SVM and SVM, NDSVM has better classification performance, which effectively improve the classification performance of SVM algorithm on the uneven distribution and indistinct boundary in imbalanced dataset.
作者
刘悦婷
LIU Yueting(School of Media Engineering, Lanzhou University of Arts and Science, Lanzhou 730000, Chin)
出处
《延边大学学报(自然科学版)》
CAS
2018年第1期43-48,共6页
Journal of Yanbian University(Natural Science Edition)
基金
2015年甘肃省高等学校科研项目(2015B-132)
关键词
支持向量机
不平衡数据集
近邻密度
分布不均匀
边界区域
support vector machine
imbalanced dataset
neighbor density
uneven distribution
boundary