期刊文献+

农业网站中垃圾网页过滤方法的研究 被引量:2

Research on garbage pages of agriculture websites filting method
原文传递
导出
摘要 在农业网站中存在着大量的含有无效信息的网页,为了将这些垃圾网页从海量的网页中过滤出去,本文提出了一种新的方法,即通过朴素贝叶斯法与决策树法相结合的方法来判别垃圾网页。 There are a great many invalid information web pages in agriculture websites,in order to filter these garbage pages from many web pages,we put forward a new method,namely through the combination of the bayesian method and the decision tree method to junk them.
出处 《网络安全技术与应用》 2011年第1期55-57,共3页 Network Security Technology & Application
关键词 搜索引擎 垃圾网页 朴素贝叶斯方法 决策树 Search engine garbage pages the bayesian method the decision tree
  • 相关文献

参考文献5

二级参考文献33

  • 1王琦,唐世渭,杨冬青,王腾蛟.基于DOM的网页主题信息自动提取[J].计算机研究与发展,2004,41(10):1786-1792. 被引量:81
  • 2赵欣欣,索红光,刘玉树.基于标记窗的网页正文信息提取方法[J].计算机应用研究,2007,24(3):144-145. 被引量:33
  • 3黄健斌,姬红兵,孙鹤立.Web网页中动态数据区域的识别与抽取[J].计算机工程,2007,33(11):53-55. 被引量:8
  • 4MURRAY B, MOORE A. Sizing the Intemet [ EB/OL ]. ( 2007- 07 ) [ 2008- 12- 07 ]. http://www, cyveillance com/web/us/downloads/ Sizing_the_lnternet. pdf.
  • 5ZHU Yang-bo, YE Shao-zhi, L1 Xing et al. Distributed PageRank computation based on iterative aggregation-disaggregation methods [C]//Proc of the 14th ACM International Conference on lnformation and Knowledge Management. New York : ACM Press, 2005:578-585.
  • 6MCCALLUM A K, NIGAM K, RENNIE J, et al. Automating the construction of Interuet portals with machine learning [ J ]. Information Retrieval Journal 2000, 3(2) :127-163.
  • 7KUMAR R, RAGHAVAN P, RAJAGOPALAN P, et al. Stochastic models for the Web graph[ C ]//Proc of the 41st Annual Symposium on Foundations of Computer Science. Washington DC : IEEE Computer Society, 2000:57.
  • 8HEYDON A, NAJORK M. Mercator: a scalable, extensible Web crawler[J]. World Wide Web, 1999, 2(4) :219-229.
  • 9KOSTER M. A standard for robot exclusion[ EB/OL]. (2008-04- 01 ) [2008-12-07 ]. http://www, robotstxt, org/wc/norobots, html.
  • 10PENG F P, DALE SCHUUMANS D, WANG shao-jun et al. Augmenting Naive Bayes Classifiers with Statistical Language Models [ J ]. Information Retrieval, 2004, 7 ( 3 ) :317- 345.

共引文献72

同被引文献9

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部