摘要
针对不平衡数据集数据分布不均匀及边界模糊的特点,提出基于局部密度改进的SVM(NLDSVM)不平衡数据集分类算法。该算法先用层次k近邻法计算多数类中每个样本的局部密度,依据每个样本的局部密度值分别选出边界区域、靠近边界区域的与少数类数目相等的样本与少数类完成SVM初始分类;用所得的支持向量机和剩余的多数类样本对初始分类器迭代优化。人工数据集和UCI数据集的实验结果表明,与WSVM,ALSMOTE-SVM和基本SVM算法相比,NLDSVM算法G-mean的平均值提高了7%,F-measure的平均值提高了6%,AUC的平均值提高了6%。NLDSVM算法分类效果良好,能有效改进SVM算法在分布不均匀及边界模糊数据集上的分类性能。
According to the characteristics of uneven distribution and indistinct boundary of imbalanced dataset,an improved SVM classification algorithm based on new local density support vector machine( NLDSVM) for imbalanced dataset is proposed. The algorithm calculates local density value of each sample in majority class using hierarchical nearest neighbor method. On the basis of the local density of each sample,the boundary region and the number of classes close to the boundary region are selected to complete the SVM initial classification. The initial classifier is iteratively optimized using the resulting support vector machine and residual data in the majority class. The simulation results of manual dataset and UCI dataset show that compared with WSVM,ALSMOTE-SVM and SVM,the average value of G-mean in NLDSVM is increased by 7%. The average value of F-measure is increased by 6%,and the average value of AUC is raised by 6%. NLDSVM has better classification performance,which effectively improves the classification performance of SVM algorithm on the uneven distribution and indistinct boundary of imbalanced dataset.
作者
刘悦婷
金兆强
刘凯
孙志权
LIU Yueting, JIN Zhaoqiang, LIU Kai, SUN Zhiquan(School of Media Engineering, Lanzhou University of Arts and Science, Lanzhou 730000, Chin)
出处
《青海大学学报(自然科学版)》
2018年第2期26-32,46,共8页
Journal of Qinghai University(Natural Science)
基金
2017年兰州文理学院校级种子基金(自然)项目(17XJZZ06)
关键词
支持向量机
不平衡数据集
局部密度
分布不均匀
边界区域
support vector machine(SVM)
imbalanced dataset
local density
uneven distribution
boundary