摘要
支持向量机根据VC维理论和结构风险最小化原则,是一种建立在统计学习理论基础之上具有新颖、功能强大特点的机器学习方法。它具有全局最优、结构简单、推广能力强等优点,近年来越来越引起关注。但支持向量机是一种小样本机器学习方法,自身的复杂性和多重共线性成为其处理大规模数据时的“瓶颈”问题。岭回归方法是一种修正的最小二乘估计法,是一种专门用于复共线性数据分析的有偏估计方法,当自变量系统中存在多重相关性时,它可以提供一个比最小二乘法更为稳定的估计。本文将岭回归.支持向量机结合,用于数据挖掘方法之一——文本分类中,实验结果表明:本方法可以提高支持向量机分类的训练速度和分类精度。
Support Vector Machine is a novel and powerful machine learning approach developed in the framework of statistical learning theory, which is based on the VC theory and the Principle of structural risk minimization. SVM has some advantages, such as simple structure and good generalization, which is one implementation in statistical learning theory. It has drawn more and more attention in recent years. However, SVM is a limited sample learning method and its need for complexity of computation and multi-collinearity is the bottle-neck to deal with large-scale data. Ridge regression is a method that the leastsquares estimation of revision. It is a biased estimate method that is specialized for multi-collinearity data analysis. When the independent variable system has the multiple relevance, it can provide more stable estimate than a least-squares method. This paper will combine ridge regression with support vector machine to solut one of the data mining method ——text classification. Experimental results show: the method can improve training speed and classification accuracy of support vector machine classification.
出处
《情报学报》
CSSCI
北大核心
2008年第2期229-234,共6页
Journal of the China Society for Scientific and Technical Information
关键词
岭回归
支持向量机
数据挖掘
文本分类
ridge regression, support vector machine, data mining, text classification