摘要
随着互联网的飞速发展,因网络作弊而产生的垃圾页面越来越多,严重影响了搜索引擎的检索效率和用户体验。反作弊已经成为搜索引擎所面临的最重要挑战之一。但目前的反作弊研究大都是基于页面内容或链接特征的,没有一个通用可行的识别方法。本文主要基于作弊目的的分析,给出作弊页面另一种体系的分类,为基于目的的作弊页面识别起到良好的导向作用。
Along with the rapid development of the Internet, the spam pages which produced by web spam are prevailing and seriously impacts the retrieval efficiency of the search engine and the user experience. Anti-spam has become one of the most important challenges for the search engines. State-of-the-art anti-spare techniques usually make use of Web page features, either content-based or hyper-link structure based, to construct Web spare classifiers, which can't deal with different spam techniques simultaneously. This paper proposes another kind of web spare taxonomy via spare intention analysis, so as to give some useful information for intent-based detection of spam pages.
出处
《中文信息学报》
CSCD
北大核心
2009年第2期95-101,共7页
Journal of Chinese Information Processing
基金
国家973重点基础研究资助项目(2004CB318108)
国家自然科学基金资助项目(60621062,60503064,60736044)
国家863高科技资助项目(2006AA01Z141)
关键词
计算机应用
中文信息处理
网络作弊
目的分析
作弊页面分类
computer application
Chinese information processing
Web spam, intention analysis, spam pages taxonomy