期刊文献+

基于改进加权压缩近邻与最近边界规则SVM训练样本约减选择算法 被引量:6

Training sample selection algorithm for SVM based on modified weighted condensed nearest neighbor and close-to-boundary criterion
下载PDF
导出
摘要 大规模的训练集中通常含有许多相似样本和大量对分类器模型构造"无用"的冗余信息,利用全部样本进行训练不但会增加训练时间,还可能因为出现"过拟合"现象而导致泛化能力下降。针对这一问题,本文从最具代表性样本与最近边界样本两个角度综合考虑,提出一种基于改进加权压缩近邻与最近边界规则SVM训练样本约减选择算法。该算法考虑到有价值训练样本对SVM分类器性能的重要影响,引进减法聚类利用改进的加权压缩近邻方法选择最具代表性的样本进行训练,在此基础上利用最近边界规则在随机小样本池中选择边界样本提高分类精度。在UCI和KDDCup1999数据集上的实验结果表明,本文的算法能够有效地去除大训练集中的冗余信息,以较少的样本获得更好的分类性能。 Large-scale training sets usually contain large amount of similar samples and redundant information, resulting in a longer training time and poor generalization ability due to over-fitting. To deal with this problem, a training sample selection algorithm for SVM based on modified weighted condensed nearest neighbor and close-to-boundary criterion is proposed. Considering the significance of valuable training sets for the performance of SVM classification,the presented method combined the most representative samples with close-to-boundary samples and utilized the modified weighted CNN rule to select the most representative samples for training with subtractive clustering approach, and then used close-to-boundary criterion to select boundary samples to improve classification accuracy in random small pools. Experimental results on UCI and KDD Cup 1999 datasets show that the proposed algorithm can eliminate the redundancy, achieve better classification performance with fewer samples.
出处 《燕山大学学报》 CAS 2010年第5期421-425,共5页 Journal of Yanshan University
基金 国家自然科学基金资助项目(61071199) 河北省自然科学基金资助项目(F2010001297 F2010001297) 中国博士后科学基金资助项目(200902356 20080440124)
关键词 样本选择 加权压缩近邻 最近边界 随机小样本池 支持向量机 sample selection weighted CNN close-to-boundary criterion random small pools support vector machines
  • 相关文献

参考文献4

二级参考文献11

共引文献44

同被引文献48

引证文献6

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部