摘要
最近,spam页面急剧增加,这极大的影响了搜索引擎的精度和效率。如何抵御spam页面已经成为一个非常重要的问题。文章合并了基于内容来侦测spam页面和基于链接spam侦测spam页面的方法,从而提出了一个两步的侦测spam页面的方法。第一步是一个过滤的步骤,用于生成spam页面的候选列表;第二步,通过一个自动的分类器从候选页面中侦测出最终的spam页面。
Recently, the amount of web spam has increased dramatically and this influenced the precision and efficiency of search engine greatly. How to combat web spam has become an important problem. In this paper, we proposed an automatedtwo-step method to detect web spam combining the method based on content analysis with the method based on link spam. The first step was a filtering step, which generated a candidate list of web spam. In the second step, a classifier was used to detect web spam from the candidates generated by the filtering step.
出处
《微型电脑应用》
2007年第4期23-25,69,共3页
Microcomputer Applications