摘要
随着互联网的迅速发展,对网页正确分类显得越来越重要。网页分类的一个难点就是特征空间的维数比较大,支持向量机(SVM)分类方法显示出比其它分类方法更好的性能,但是训练样本时却花费了比其它算法更多的时间。提出了一种基于选择最确信的词来预测一个文本的类别的特征提取方法,通过中文文本实验,结果表明在不降低分类准确性的前提下,缩短了训练时间。
With the rapid development of Internet,the need of correctly Web page classification is becoming more and more critical.The major problem in Web page classification is the high dimensionality of feature space.The Support Vector Machine classifier is shown to perform better than other Web page classification algorithms.However,the time taken for training a Support Vector Machine model is more than other algorithms.A feature selection method based on the most certainly keyword to predict the category of a Web page was proposed.Through the experimental of Chinese text,the results show that this method reduces the training time,while maintaining the accuracy of Web page classification.
出处
《科学技术与工程》
2011年第6期1359-1362,共4页
Science Technology and Engineering
关键词
特征提取
WEB分类
支持向量机
feature selection Web page classification support vector machine