期刊文献+

改进偏二叉树多类SVM的文本分类

Text Categorization Improvement Based on Partial Binary Tree Multi-classification SVM
下载PDF
导出
摘要 在文本分类过程中,为解决传统支持向量机(SVM)多类分类的不可分区域问题及提高分类性能,提出了一种改进的偏二叉树多类SVM算法。算法依据根据样本的分布情况计算训练集文本特征参数的信息熵,并将熵值结合欧式距离公式以确定各类文本间的相似性测度;以相似性测度作为偏二叉树结构的分类走向,对训练集进行学习,构建各个二类子SVM分类器。实验结果表明,该算法具有较高的分类性能,能更好地解决实际文本分类过程中的问题。 To solve the unclassifiable regions of text categorization and elevate classification performance in the process of text categorization, an improved multi - classification SVM based on partial binary tree is proposed. The entropy of feature parameters of text train is calculated according to the distribution of the sample, and the entropy is combined with euclidean distance in order to determine the similarity measure between the types of text. The partial binary tree categorization direction is determined based on similarity measure, and the second - class SVM classifiers are created for learning the text training set. The experimental result shows that the improved algorithm can better deal with the practical text categorization problems with more classification performance.
作者 周靖
出处 《广东石油化工学院学报》 2011年第4期56-58,66,共4页 Journal of Guangdong University of Petrochemical Technology
关键词 文本分类 多类分类 支持向量机 偏二叉树 text categorization multi- classification support vector machine (SVM) partial binary tree entropy
  • 相关文献

参考文献2

二级参考文献2

共引文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部