摘要
支持向量机增量算法的关键是对历史样本集的剪辑,在历史样本集中选择出尽可能少又能表示尽可能多历史样本集信息的子集,再把这个子集与新增训练样本集放在一起进行训练。Liva Ralaivola[1]提出保留新增样本最近邻样本来表示历史样本集,而这样的最近邻样本中可能存在冗余样本。根据历史样本与分类平面间的距离可以去除新增样本最近邻样本集中的冗余样本。根据样本平面距离提出了MSPDISVM(minimum sample plane distance incremental support vector ma-chines)算法。实验结果表明,MSPDISVM比Liva Ralaivola提出的算法有更快的速度,而精度没有太大的差异。使用样本平面距离可以有效地去除新增样本最近邻中的冗余样本。
The key of the incremental support vector machine algorithm is a montage of the historical dataset, a subset as little as possible is searched to represent the historical dataset. Then this subset is added to the new dataset to retrain. Lira Ralaivola[1] proposed to reserve the neighborhoods of the new sample to represent the historical dataset. There are redundant samples in the neighborhoods. According to sample plane distance from historical sample to separation plane, the redundant samples can be disposed. The MSPDISVM (minimum sample plane distance incremental support vector machine) algorithm is proposed by the principle. Empirical results show that the speed of MSPDIVM is faster than LISV1VIElI. And the accuracy is almost as the same as LISVIVIES. The sample plane distance can remove the redundant samples in the neighborhoods of the new sample effectively.
出处
《计算机工程与设计》
CSCD
北大核心
2012年第1期346-350,共5页
Computer Engineering and Design
基金
江苏省自然基金项目(BK2009393)
江苏省青蓝工程学术带头人基金项目(BK2009393)