摘要
C-SVM分类算法在不同类别样本数目不均衡的情况下,训练时的分类错误倾向于样本数目小的类别。样本集中出现重复样本时作为新样本重新计算,增加了算法的训练时间。针对这两种问题,分析了产生的原因,提出了一种加权支持向量机算法,补偿了类别差异造成的不利影响,加快了重复样本的决策速度。为提高算法的推广性能,在模型训练过程中引入遗传算法自动选择惩罚因子和核函数宽度两个参数。实验结果表明了该算法可以有效地解决类别不均衡和重复样本问题,且训练模型具有良好的推广性能。
When training sets with uneven class sizes are used,the classification error based on C-Support Vector Machine is undesirably biased towards the class with fewer samples in the training set.When training with multi-duplicated samples,CTSVM depends on each sample leading to more time for training.A new weighted support vector machine algorithm is proposed based on the analysis of the cause of such problems,which compensates for the unfavorable impact caused by the uneven class sizes and makes the decision speed faster.To obtain a good generalization performance,genetic algorithm is used to tune the regularization parameter and parameter of the kernel function when training the model.Experiments show that the proposed approach can control the misclassification error rates of classes and deal with multi-duplicate samples with good generalization performance.
出处
《计算机工程与应用》
CSCD
北大核心
2006年第2期64-66,221,共4页
Computer Engineering and Applications
基金
交通部基础研究项目(编号:200432922504)
关键词
加权支持向量机
类别差异
重复样本
遗传算法
参数调节
weighted support vector machines,uneven class sizes,multi-duplicated samples,Genetic Algorithms,parameter tuning