
一种反Web Spam页面的方法

A Method for Combating Web Spam Pages
摘要 最近,spam页面急剧增加,这极大的影响了搜索引擎的精度和效率。如何抵御spam页面已经成为一个非常重要的问题。文章合并了基于内容来侦测spam页面和基于链接spam侦测spam页面的方法,从而提出了一个两步的侦测spam页面的方法。第一步是一个过滤的步骤,用于生成spam页面的候选列表;第二步,通过一个自动的分类器从候选页面中侦测出最终的spam页面。 Recently, the amount of web spam has increased dramatically and this influenced the precision and efficiency of search engine greatly. How to combat web spam has become an important problem. In this paper, we proposed an automatedtwo-step method to detect web spam combining the method based on content analysis with the method based on link spam. The first step was a filtering step, which generated a candidate list of web spam. In the second step, a classifier was used to detect web spam from the candidates generated by the filtering step.
作者 蒋涛 张彬
出处 《微型电脑应用》 2007年第4期23-25,69,共3页 Microcomputer Applications
关键词 垃圾网页 TrustRank 链接spam Web spam TrustRank Link spam
  • 相关文献


  • 1M.R.Henzinger,R.Motwani,and C.Silverstein.Challenges in web search engines[J].SIGIR Forum,36(2):11-22,Fall 2002.
  • 2Z.Gyongyi,H.Garcia-Molina,and J.Pedersen.Combating web spam with TrustRank[J].In Proceedings of the 30th VLDB Conference,Sept.2004.
  • 3PR10.info.BadRank as the opposite of PageRank,2004.http://en.prl0.info/pagerank0-badrank/,2006.
  • 4D.Fetterly,M.Manasse,M.Najork.Spam,damn spam,and statistics:Using statistical analysis to locate spam web pages[J].In Proceedings of the seventh workshop on the Web and databases(WebDB),pages 1-6,Paris,France,June 2004.
  • 5A.Benczur,K.Csalogany,T.Sarlos et al.Spamrank -fully automatic link spam detection[J].In First International Workshop on Adversarial Information Retrieval on the Web,2005.
  • 6A.Ntoulas,M.Najork,M.Manasse,et al.Detecting spam web pages through content analysis[J].In Proceedings of the 15th International Conference on the World Wide Web,Edinburgh,Scotland,May 2006.
  • 7Zoltan Gyongyi,Pavel Berkhin,Hector Garcia-Molina et al.Link spam detection based on mass estimation[J].In Proceedings of the 32nd International Conference on Very Large Data Bases(VLDB),2006.
  • 8B.Wu,V.Goel,and B.D.Davison.Topical TrustRank:Using topicality to combat web spam[J].WWW'06(Edinburgh,Scotland),ACM Press,New York,May 2006.63-72.
  • 9A.Benczur,K.Csalogany,T.Sarlos et al.Spamrankfully automatic link spam detection[J].In First International Workshop on Adversarial Information Retrieval on the Web,2005.
  • 10L.Page,S.Brin,R.Motwani,and T.Winograd.The PageRank citation ranking:bringing order to the Web[EB/OL].Technical report,Stanford Digital Library Technologies Project,1998.








使用帮助 返回顶部