摘要
支持向量机(SVM)作为一种有效的模式分类方法,当数据集规模较大时,学习时间长、泛化能力下降;而核向量机(CVM)分类算法的时间复杂度与样本规模无关,但随着支持向量的增加,CVM的学习时间会快速增长。针对以上问题,提出一种CVM与SVM相结合的二阶段快速学习算法(CCS),首先使用CVM初步训练样本,基于最小包围球(MEB)筛选出潜在核向量,构建新的最有可能影响问题解的训练样本,以此降低样本规模,并使用标记方法快速提取新样本;然后对得到的新训练样本使用SVM进行训练。通过在6个数据集上与SVM和CVM进行比较,实验结果表明,CCS在保持分类精度的同时训练时间平均减少了30%以上,是一种有效的大规模分类学习算法。
Support Vector Machine (SVM) is a widely used classification technique. But the scalability of SVM to handle large data sets still needs much of exploration. Core Vector Machine (CVM) is a technique for scaling up a two class SVM to handle large data sets. However, it is computationally infeasible to use CVM to deal with the data set with mass Support Vectors (SV), as its training time is related to the number of SV. In this paper, a two-stage training algorithm combining CVM with SVM (CCS) was proposed. It first employed Minimum Enclosing Ball (MEB) based CVM algorithm to determine the potential core vectors, and then used labeling method to rapidly reconstruct training set, which aim is to reduce the scale of training set. After obtaining new training samples, SVM was adopted to deal with them. The experimental results indicate that the proposed approach can reduce the training time by 30% without losing the classification accuracy, and it is an efficient method for handling large-scale classification.
出处
《计算机应用》
CSCD
北大核心
2012年第2期419-424,共6页
journal of Computer Applications
关键词
支持向量机
分类
大规模数据集
核向量机
最小包围球
Support Vector Machine (SVM)
classification
large data set
Core Vector Machine (CVM)
MinimumEnclosing Ball (MEB)