摘要
中文网页分类技术是数据挖掘研究中的一个热点领域,而支持向量机(SVM)是一种高效的分类识别方法。首先给出了一个基于SVM的中文网页自动分类系统模型,详细介绍了分类过程中涉及的一些关键技术,其中包括网页预处理、特征选择和特征权重计算等。提出了一种利用预置关键词表进行预分类的方法,并详细说明了该方法的原理与实现。实验结果表明,该方法与单独使用SVM分类器相比,不仅大大减少了分类时间,准确率和召回率也明显提高。
Chinese Web page classification has been considered as a hot research area in data mining,and SVM is an effective method for learning the classification knowledge from massive data.In this paper,a model of automatic Chinese Web page classification system based on SVM is presented first.Then detailed design and implementation are introduced,and some key techniques about Chinese Web page classification,including Web page pre-processing,feature selection and weight computing are discussed.A pre-classification method by a given keywords list is proposed,and the principles and detailed implementation are described.The experiment shows that it not only reduces time but also increases in precision and recall compared with using SVM classifier only.
出处
《计算机工程与应用》
CSCD
北大核心
2010年第1期125-128,共4页
Computer Engineering and Applications
关键词
支持向量机
中文网页分类
文本分类
机器学习
support vector machine
Chinese Web page classification
text classification
machine learning