摘要
最近,spam页面急剧增加,这极大的影响了搜索引擎的精度和效率。如何抵御spam页面已经成为一个非常重要的问题。合并基于内容来侦测spam页面和基于链接spam侦测spam页面的方法,提出一个两步侦测spam页面的方法。第一步是一个过滤的步骤,用于生成spam页面的候选列表;第二步,通过一个自动的分类器从候选页面中侦测出最终的spam页面。
Recently, the amount of web spam has increased dramatically and this influences the precision and efficiency of search engine greatly. How to combat web spam has become an important problem. This paper proposes an automated two-step method to detect web spam combined the methods based on content analysis and the methods based on link spam. The first step is a filtering step, which generates a candidate list of web spam. In the second step, a classifier is used to detect web spam from the candidates generated by the filtering step.
出处
《计算机与数字工程》
2007年第11期76-78,152,共4页
Computer & Digital Engineering