摘要
传统的支持向量机无法充分、有效地检测出类间重叠区域中的少数实例,也无法对不平衡的数据集作出合理分类,而类的重叠分布和不平衡分布在复杂数据集中是常见的.因而,它们对支持向量机的分类性能产生负面影响.基于此,提出了一种利用距离度量代替支持向量机松弛变量的改进模型.在一定程度上解决了支持向量机处理复杂数据集中类间重叠和不平衡的问题.最后,利用合成数据集和UCL数据库中的数据集的实验验证了该算法的先进性.
Traditional support vector machines can not sufficiently and effectively detect the instance of the minority class in the overlap region and can not make a reasonable classification of the imbalanced data sets.However,the overlapping and imbalanced of the classes are common in complicated data sets.As a result,they have a negative impact on the classification performance of support vector machines.Based on this,an improved model is proposed to replace the slack variables of support vector machine based on distance measure.To a certain extent,it solves the problem that the support vector machine is dealing the overlapping and imbalanced of the classes in complicated data sets.Finally,the advanced nature of the algorithm is verified by the experimental results of the data set in the synthetic data set and the UCL database.
出处
《陕西科技大学学报(自然科学版)》
2017年第2期189-194,共6页
Journal of Shaanxi University of Science & Technology
基金
山西省自然科学基金项目(2015011040)
关键词
支持向量机
重叠
不平衡
松弛变量
距离度量
support vector machine
overlapping
imbalanced
slack variable
distance measure