期刊文献+

基于目的分析的作弊页面分类 被引量:7

Web Spam Taxonomy via Spam Intention Analysis
下载PDF
导出
摘要 随着互联网的飞速发展,因网络作弊而产生的垃圾页面越来越多,严重影响了搜索引擎的检索效率和用户体验。反作弊已经成为搜索引擎所面临的最重要挑战之一。但目前的反作弊研究大都是基于页面内容或链接特征的,没有一个通用可行的识别方法。本文主要基于作弊目的的分析,给出作弊页面另一种体系的分类,为基于目的的作弊页面识别起到良好的导向作用。 Along with the rapid development of the Internet, the spam pages which produced by web spam are prevailing and seriously impacts the retrieval efficiency of the search engine and the user experience. Anti-spam has become one of the most important challenges for the search engines. State-of-the-art anti-spare techniques usually make use of Web page features, either content-based or hyper-link structure based, to construct Web spare classifiers, which can't deal with different spam techniques simultaneously. This paper proposes another kind of web spare taxonomy via spare intention analysis, so as to give some useful information for intent-based detection of spam pages.
出处 《中文信息学报》 CSCD 北大核心 2009年第2期95-101,共7页 Journal of Chinese Information Processing
基金 国家973重点基础研究资助项目(2004CB318108) 国家自然科学基金资助项目(60621062,60503064,60736044) 国家863高科技资助项目(2006AA01Z141)
关键词 计算机应用 中文信息处理 网络作弊 目的分析 作弊页面分类 computer application Chinese information processing Web spam, intention analysis, spam pages taxonomy
  • 相关文献

参考文献18

  • 1中国互联网络信息中心(CNNIC).2007.第19次中国互联网络发展状况统计报告[OL].http:/www.cnnic.cn/html/Dir/2007/01/22/4395.htm.
  • 2中国互联网络信息中心(CNNIC).2005.第16次中国互联网络发展状况统计报告[OL].http://www.china.org.cn/chinese/news/922344.htm.
  • 3Silverstein, C., Marais, H., Henzinger, M. et al. 1999. Analysis of a very large web search engine query log. [C]//Proceedings of the 22th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Berkeley, California, United States, August 15-19, 1999 ). SIGIR ' 99. ACM Press, New York, NY, 6-12.
  • 4Henzinger, M., Motwani, R., Silverstein. C. Challenges in Web Search Engines.[C]//Proceedings of the 25th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Tampere, Finland, August 11-15, 2002). SIGIR '02. ACM Press, New York, NY, 2002: 11-22.
  • 5Gyongyi, Z. and Garcia-Molina, H. 2005. Web spam taxonomy. [C]//First International Workshop on Adversarial Information Retrieval on the Web (Chiba, Japan, May 2005). AIRWeb '05.
  • 6Brin, S. and Page, L. The anatomy of a large-scale hypertextual Web search engine.[C]//Proceedings of the Seventh international Conference on World Wide Web 7 (Brisbane, Australia). 1998:107-117.
  • 7Kleinberg. J. M. 1999. Authoritative sources in a hyperlinked environment [J]. Journal of the ACM, 1999, 46(5): 604-632.
  • 8Wu, B. and Davison, B. Cloaking and redirection: a preliminary study. In First International Workshop on Adversarial Information Retrieval on the Web (Chiba, Japan, May 2005). [C]//AIRWeb '05. 2005.
  • 9Wang, Y., Ma, M., Niu, Y., and Chen, H. Spam double-funnel: Connecting web spammers with advertisers. [C]//Proc. of the 16th International Conference World Wide Web (Banff, Alberta, Canada. May 8 12, 2007). WWW '07. ACM Press, New York, NY, 2007: 291-300.
  • 10Fetterly, D., Manasse, M. and Najork, M. Spam, damn spare, and statistics: Using statistical analysis to locate spam web pages. [C]//Amer-Yahia S. and Gravano, L., eds. Proceedings of the 7th International Workshop on the Web and Databases (WebDB 2004). New York: ACMPress, 2004: 1-6.

同被引文献110

引证文献7

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部