期刊文献+

基于链接关系的网页分类优化算法 被引量:2

Optimizing Web Page Classification Algorithm by Using Hyperlinks
下载PDF
导出
摘要 针对基于链接关系的网页分类算法中存在噪声邻域网页干扰分类结果的问题,提出利用网页间的相似度进行优化的方法。为不同关系的满足相似度阈值的邻域网页分别设置不同的权值,同时结合支持向量机对网页的分类结果,计算得到网页的类别。实验表明,本文算法准确率、召回率和F1值均有所提高。 There is a problem in the Web page classification algorithm by using hyperlinks, the noise neighbors interfere with the results of the classification. To solve the problem an optimization method was presented, which utilizes the similarities between pages. If neighbors meet the thresholds, they are set different weights for different relationships. The results of classification by support vector machine are also used. Experiment shows that it increases in precision, recall and F1 value.
出处 《计算机与现代化》 2014年第5期14-17,23,共5页 Computer and Modernization
基金 国家级教学团队建设项目(00700054J1901)
关键词 网页分类 邻域网页 相似度 支持向量机 Web page classification neighboring page similarity support vector machine
  • 相关文献

参考文献17

  • 1中国互联网络中心(CNNIC).第33次中国互联网络发展状况统计报告[EB/OL].http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/201403/t20140305-46240.htm,2014旬3_05.
  • 2Fiol-Roig G, Mir6-Juli:t M, Herraiz E. Data mining tech- niques for Web page classification[ J]. Highlights in Prac- tical Applications of Agents and Multiagent Systems, 2011, 89:61-68.
  • 3Baykan E, Henzinger M, Marian L, et al. A comprehensive study of features and algorithms for URL-based topic classifi- cation[J]. ACM Transactions on the Web (TWEB), 2011, 5(3) :No 15.
  • 4Srittrai W, Meesad P, Haruechaiyasak C. Improving Web page classification by integrating neighboring pages via a topic model[ C]// Proceedings of IICS, 2010. 2010:238-246.
  • 5Qi X, Davison B D. Classifiers without borders: Incorpora- ting fielded text from neighboring Web pages [ C ]// Pro- ceedings of the 31 st Annual International ACM SIGIR Con- ference on Research & Development on Information Re- trieval. 2008:643-650.
  • 6Croft W B, Metzler D, Strohman T. Search engines: Infor- mation Retrieval in Practice [ M]. Addison-Wesley, 2010: 351-358.
  • 7姚旭,王晓丹,张玉玺,权文.特征选择方法综述[J].控制与决策,2012,27(2):161-166. 被引量:207
  • 8Issac B, Jap W J. Implementing spam detection using Bayesian and Porter Stemmer keyword stripping approaches [C]/! IEEE Region 10 Conference on TENCON 2009- 2009. 2009 : 1-5.
  • 9施聪莺,徐朝军,杨晓江.TFIDF算法研究综述[J].计算机应用,2009,29(B06):167-170. 被引量:218
  • 10任永功,杨荣杰,尹明飞,马名威.基于信息增益的文本特征选择方法[J].计算机科学,2012,39(11):127-130. 被引量:31

二级参考文献127

共引文献584

同被引文献3

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部