期刊文献+

网页自动分类的建模与仿真研究 被引量:3

Modeling and Simulation of Web Automatic Classification
下载PDF
导出
摘要 研究网页自动分类是为快速找到用户所需网页。由于网络中网页数量相当大,而且网络是一种半结构化、海量、高维等文本,传统文本分类方法无法进行降维和消除冗余信息,易出现维数灾问题,网页分类准确率低,用户很难找到自己所需网页。为了提高网页分类准确率,提出基于主成分支持向量机的网页自动分类方法。首先对网页数据进行预处理,提取网页特征向量向量,消除冗余信息,然后采用主成分分析对网页特征向量进行降维处理,然后采用支持向量机对网页进行自动分类。对网页数据集进行仿真,结果表明,网页分类准确率达95%以上,网页分类速度较加,说明主成分支持向量机是一种有效的网页分类方法。 Research data mining technology and improve the web classification accuracy.Web data has the characteristics of semi-structured,vast and high-dimension,and the traditional classification methods cannot reduce the dimension andemliminatethe redundant messege,easily causing dimension disaster problem and low web classification accuracy.In order to improve the web classification accuracy,a web automatic classification method was proposed based on principal component analysis of support vector machine.Firstly,the web data was pretreatmented and the feature vector sets were extracted.Then,the web features were reduced by principal component analysis,and the webs were classified by the support vector machine.The simulation experiments were carried out on web dataset,and the web classification accuracy is over 95%,meanwhile,the classification speed is increased.The results show that the proposed method is an effective web classification method.
作者 周序生 李爽
出处 《计算机仿真》 CSCD 北大核心 2011年第10期121-124,252,共5页 Computer Simulation
基金 湖南省科技厅计划项目(2010FJ3024) 湖南工业大学教学改革研究项目(09A02)
关键词 网页分类 主成分分析 支持向量机 数据挖掘 Web categorization PCA SVM Data mining
  • 相关文献

参考文献6

二级参考文献29

  • 1冯是聪 单松巍 张志刚 等.一个中文网页数据集及其分类体系[A]..海峡两岸技术交流会[C].南京,2002-10.121-129.
  • 2Yiming Yang,Jan O Pedersen.A comparative Study on Feature Selection in Text Categorization[C].In :Proceedings of the Fourteenth International Conference on Machine Leaming(ICML'97), 1997.
  • 3Yiming Yang,Xin Liu.A re-examination of text categorization methods[C].In:Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR'99,1999:42---49.
  • 4Yiming Yang.A study on thresholding strategies for text categorization[C].In:Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'01),2001.
  • 5Lang K,Proc the 12th Int Conference on Machine Learning(ICML 95),1995年,331页
  • 6Zhang Yizhong, Zhao Mingsheng, Wu Youshou. The Automatic Classification of Web Pages Based on Neural Networks. Neural Information Processing, ICONIP2001 Proceedings, 2001, 2:570-575
  • 7Salton G, McGill J. Introduction to Modern Information Retrieval 1 edition [M]. Auckland: McGraw Hill, 1983.
  • 8Slattery S. Hypertext Classification [D]. Pittsburgh: Carnegie Mellon Univ, 2001.
  • 9Yang Y, Slattery S, Ghani R. A study of approaches to hypertext categorization [J]. J Intelligent Info Syst, 2002, 18(2/3): 219-241.
  • 10Furnkranz J. Exploiting structural information for text classification on the WWW [A]. IDA'99 [C]. Amsterdam: Springer Verlag, 1999. 487-497.

共引文献167

同被引文献13

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部