期刊文献+

基于结构信息和时域信息的垃圾网页检测技术 被引量:1

Web Spam detection based on structural and temporal information
下载PDF
导出
摘要 提出一种将结构信息和时域信息综合运用来检测Spam技术的方法,并针对目前流行的Spam技术,从四个月的网络链接图中提取大量的结构和时域信息特征,训练一组检测Spam技术的网页分类器,取得了很好的实验结果。 A novel framework was proposed to combine structural information and temporal information together for Spam detection. Targeting at detecting Some popularity spamming technologies, a group of well-designed structural and temporal features was extracted from a series of link graphs of four successive months, and train a set of classifiers to distinguish normal websites from spam websites. Experiments on a real-world dataset show that the proposed method is quite effective for Web Spam detection
出处 《计算机应用研究》 CSCD 北大核心 2008年第4期1243-1246,共4页 Application Research of Computers
关键词 Spam技术 搜索引擎优化 PAGERANK Spam technology SEO PageRank
  • 相关文献

参考文献9

  • 1BAEZA-YATES P, RIBEIRO-NETO B. Modern information retrieval [ M]. [ S. l. ] : Addison Wesley Longman Publishing Co Inc,1999.
  • 2PAGE L, BRIN S, AMRITKAR R E, et al. The PageRank citation ranking: bringing order to the Web [ EB/OL]. ( 1999- 11- 11 ). http ://rtewdbpubs. stanford. edu/8090/pub/1999-66/1999.
  • 3KLEINBERG J M. Authoritative sources in a hyperlinked environment [J]. Journal of the ACM,1999,46(5) : 604-632.
  • 4GYONGYI Z, GARCIA-MOLINA H, PEDERSEN J. Combating Web Spam with TrustRank [ C]//Proc of International Conference on Very Large Data Bases (VLDB). 2004.
  • 5WU B, DAVISON B D. Identifying link farm Spam pages[ C]//Proc of the 14th Int'l Conf on World Wide Web. New York:ACM Press, 2005:820-829.
  • 6DAVISON B D. Recognizing nepotistic links on the Web[ EB/OL]. (2000). http ://citeseer. ist. psu. edu/davison00recognizing. html.
  • 7BENCZUR A A, CSALOGANY K, SARLOS T, et al. SpamRankfully automatic llnk spain detection [ C ]//Proc of the 1 st AIRWeb. 2005.
  • 8SHEN Guo-yang, GAO Bin, LIU Tie-yan, et al. Detecting link Spam using temporal information [ C ]//Proc of ICDM- 2006. 2006.
  • 9FREUND Y, SCHAPIRE E. A decision theoretic generalization of on line learning and an application to boosting[ J ].Journal of Gomputer and System Sciences, 1997, 55 ( 1 ) : 119-139.

同被引文献21

  • 1Zoltan Gyongyi, Hector Garcia - Molina, Jan Pedersen. Combating web spam with TrustRank [ M ]. In Proceedings of the 30st International Conference on .Very Large Data Bases, Trondheim, Toronto, Canada. San Francis- co : Morgan Kaufmann. , 2004:576 - 583.
  • 2Avier Ortega F, Craig Macdonald, Troyano Jos6 A, et ai. Spam detection with a content -based random- walk algorithm [ M ]. Proceedings of the 2rid interna- tional workshop on Search and mining user - generated contents, Toronto, Canada. New York : ACM, 2010: 45 -51.
  • 3Wu Bao -ning, Vinay Goel, Brian D Davison. Topical TrustRank : Using topicality to combat web spam [ M ]. In Proceedings of the 15th International World Wide Web Conference, Edinburgh, Scotland. New York : ACM, 2006:63 -72.
  • 4Google. PRO - Googleg PageRank 0 Penalty [ J/OL] 2010 - 12 - 28 (2011 - 03 - 21 ). http ://pr. efactory de/e - pr0. shtml.
  • 5Vijay Krishnan, Rashmi Raj. Web spam detection with anti - trust rank [ M ]. In Proceedings of the Second In- ternational Workshop on Adversarial Information Re- trieval on the Web, Washington, USA. New York : ACM, 2006:37 - 43.
  • 6Wu Baoning, Vinay Goel, 'Brian D D. Propagating trust and distrust to demote web spam [ M ]. In Pro- ceedings of Models of Trust for the Web, Edinburgh, Scotland. New York :ACM, 2006.
  • 7Gan Qingqing, Torsten Suel. Improving web spam classifiers using link structure [ M ]. In Proceedings of the Third International Workshop on Adversarial Infor- mation Retrieval on the Web, Banff, Alberta, Cana-da. New York : ACM, 2007 : 17 - 20.
  • 8Hiroo Saito, Masashi Toyoda, Masaru Kitsuregawa, et al. A large - scale study of link spam detection by graph algorithms [ M ]. In Proceedings of the Third In- ternational Workshop on Adversarial Information Re- trieval on the Web, Banff, Alberta, Canada. New York : ACM, 2007:45-48.
  • 9Liu Yiqun, Cen Rongwei, Zhang Min, et al. Web Spam with user behavior analysis [ M ]. Proceedings os AIRWeb'08, Beijing, China, 2008:108- 110.
  • 10Yang Haixuan, Irwin King, Michael R Lyu. Diffusion- Rank: a possible penicillin for web spamming[ M]. In Proceedings of the 30th annual international ACM SI- GIR conference on Research and development in infor- mation retrieval, Amsterdam, Netherlands. 2007 : 431 - 438.

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部