摘要
本文提出了一种用于文本分类的RBF支持向量机在线学习算法。利用RBF核函数的局部性,该算法仅对新训练样本的某一大小邻域内且位于“可能带”中的训练样本集进行重新训练,以实现对现有SVM的更新。为高效的实现该邻域大小的自适应确定,使用ξa泛化错误估计在所有现有训练样本集上对当前SVM的泛化错误进行定性估计。同时引入泛化能力进化因子,使得结果SVM在分类效果上具有自动调整能力,并防止分类能力的退化。在TREC-5真实语料上的对比测试结果表明,该算法显著地加速了增量学习的过程而同时保证结果SVM的分类效果。
This paper suggests an on-line incremental learning algorithm based on RBF SVMs for text categorization problem. By exploiting the locality of RBF kennels, our algorithm updates current SVM using a subset of possible support candidates both in certain neighborhood of the new coming document and in a possible band. The size of subset is decided adaptively and efficiently by using of ζα generation error estimator on all the available training samples to qualitatively estimate the generation error rate. We also use an evolutionary factor of generation ability to make resulting SVMs adaptive on classifying precision and guarantee the generation ability of them. Comparative experiments on real-life TREC - 5 corpus show thai our algorithm can remarkably accelerate the process of incremental learning while retains the classifying precision.
出处
《中文信息学报》
CSCD
北大核心
2005年第5期11-15,23,共6页
Journal of Chinese Information Processing
基金
国家自然科学基金支持项目(60272088)
关键词
计算机应用
中文信息处理
文本分类
在线学习
增量学习
支持向量机
SMO
computer application
Chinese information processing
text categorization
on-line learning
incremental learning
SVM
SMO