摘要
对支持向量机的大规模训练问题进行了深入研究,提出一种类似SMO的块增量算法。该算法利用increase和decrease两个过程依次对每个输入数据块进行学习,避免了传统支持向量机学习算法在大规模数据集情况下急剧增大的计算开销。理论分析表明新算法能够收敛到近似最优解。基于KDD数据集的实验结果表明,该算法能够获得接近线性的训练速率,且泛化性能和支持向量数目与LIBSVM方法的结果接近。
This paper made a deep study on the training problems of SVM on very large data set, proposed a novel block-based incremental algorithm for solving the problem, namely BISVM, which worked like SMO. The new algorithm utilizes the increase and the decrease procedures to learn inputting data blocks one by one so that the rapidly-increased computation costs for large datasets could be avoided. Theoretical analyses show that BISVM converges to the solution of support vector machines. Experimental results on KDD dataset indicate that training time of BISVM is approximate liner to the scale of problem, while receives the comparable generalization performance as that of LIBSVM.
出处
《计算机应用研究》
CSCD
北大核心
2008年第1期98-100,113,共4页
Application Research of Computers
基金
四川省青年软件创新基金资助项目(2005AA0827)
关键词
支持向量机
块增量算法
大规模训练
support vector machines(SVM)
block-based incremental algorithm
large-scale training