摘要
随着因特网的迅速增长,能够分类大规模文档的高效文本分类算法变得非常重要。该文提出一种基于加权近似支持向量机模型的文本分类算法,加权近似支持向量机对近似支持向量机作了改进,通过为每个训练误差增加一个权值和使用在原空间直接求解的算法,克服了近似支持向量机模型不适合不平衡数据分类和高维数据分类的缺点。试验结果表明,与标准支持向量机算法相比,该算法的分类质量与训练速度都有提高,是一种适合文本分类的高效算法。
With the rapid growth of the World Wide Web, efficient and effective text classification algorithms become very important for classifying and organizing large-scale documents in the web. This paper presents a text classification algorithm based on a weighted proximal support vector machine SVM model. This model is derived from the proximal SVM model. By adding a weight to each training error and using a direct solving method, the weighted proximal SVM model is more suitable for high-dimensional and unbalanc...
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2005年第S1期1787-1790,共4页
Journal of Tsinghua University(Science and Technology)
关键词
信息处理
文本分类
支持向量机
information processing
text classification
support vector machine SVM