摘要
支持向量机(Support Vector Machine,SVM)是一种Vapnik等在统计学理论的基础上发展起来的可训练机器学习的方法。它主要针对小样本的机器学习,具有泛化性能好、高维操作方便、适应性强、全局优化、训练时间短、理论完备等特点,因此得到了日益广泛的应用和研究。本文将半监督学习算法应用到基于支持向量机的文本分类技术[1-2]中,提出了一组基于几何正则化方式的学习算法。虽然这种新型算法适用于无监督到完全监督的整个范围,本文专注于半监督学习算法方面的研究。之后,本文讨论了新型方法在SVM算法上的扩展。实验数据表明,这种新型算法可以有效的使用未标记数据。
SVM is a method proposed by Vapnik et al. and developed on the basis of the Statistical theory, and it is also a Trainable machine learning method. It focused on a small sample of the machine learning, with good generalization performance, ease of operation of the high-dimensional, adaptable, global optimization, short training time, a complete theory and so on. Therefore, it has been an increasingly wide range of applications and research. In this article, we will semi-supervised learning algorithm is applied to text classification techniques based on support vector machine. We propose a family of learning algorithms based on a new form of regularization that allows us to exploit the geometry of the marginal distribution. While this framework allows us to approach the full range of learning problems from unsupervised to supervised, we focus on the problem of semi-supervised learning. As a result, we obtain a SVM extension. Our experimental evidence suggests that our semi-supervised algorithms are able to use unlabeled data effectively.
出处
《软件》
2013年第2期65-68,共4页
Software
关键词
半监督学习
正则化
核方法
流形学习
无标签数据
支持向量机
图谱轮
Semi-supervised learning
regularization
kernel methods
manifold learning
unlabeled data
support vector machines
spectral graph theory