期刊文献+

基于用户行为的色情网站识别 被引量:5

Pornography Web Site Identification Based on User Behavior Analysis
下载PDF
导出
摘要 以色情网站为代表的万维网非法资源已经成为互联网应用普及过程中的重大挑战.由于色情网站与普通网站的内容特征、结构形式和访问者群体都有显著的差异,这造成了用户对色情网站和普通网站的访问行为的差异.在某商业搜索引擎的协助下,收集了海量规模互联网用户访问日志,基于对日志中所记载用户行为的挖掘,验证了用户访问色情网站与普通网站时的行为确实具有明显的差异.基于此类差异设计了一系列用户行为特征,并结合机器学习方法,设计了基于用户行为的色情网站识别方法.实验表明,该方法可以较准确、高效地从网站中识别色情网站. The problem of illegal Web resources, especially pornography sites, poses a major challenge for Web-related applications. Due to the significant differences in page content, site structure and visitors, user behavior patterns on pornography Web sites and ordinary Web sites can be separated from each other. With the help of a popular commercial search engine in China, large scale user behavior data is collected and it is found that when users surf in porn sites, their behaviors are significantly different from that when they are visiting ordinary Web sites. These differences in user behavior patterns can help us separate porn sites from other ones. A number of behavior features are proposed and combined with machine learning algorithms to develop a porn site identification method. Experimental results show effectiveness of the proposed method.
出处 《计算机研究与发展》 EI CSCD 北大核心 2013年第2期430-436,共7页 Journal of Computer Research and Development
基金 国家"八六三"高技术研究发展计划基金项目(2011AA01A205) 国家自然科学基金项目(60903107 61073071) 高等学校博士学科点专项科研基金项目(20090002120005)
关键词 色情网站 网络非法资源 用户行为分析 搜索引擎 网络浏览 pornography site illegal Web resources user behavior analysis search engine Web browsing
  • 相关文献

参考文献13

  • 1中国互联网络信息中心.第28次中国互联网络发展状况统计报告[R]北京:中国互联网络信息中心,2011.
  • 2TechMediaNetwork. Internet filter software reviews 2011[R].Ogden,Utah:TechMediaNetwork,2011.
  • 3Lee L,Luh C. Generation of pornographic blacklist and its incremental update using an inverse chi-square based method[J].Information Processing and Management,2008,(05):1698-1706.
  • 4苏贵洋,李建华,马颖华,李生红.用于中文色情文本过滤的近邻法构造算法[J].上海交通大学学报,2004,38(z1):76-79. 被引量:6
  • 5Arentz W A,Olstad B. Classifying offensive sites based on image content[J].Computer Vision and Image Understanding:Special Issue on Color for Image Indexing and Retrieval,2004,(1-3):295-310.
  • 6Zheng Q F,Zeng W,Wen G. Shape-based adult image detection[A].Piscataway,NJ:IEEE,2004.150-153.
  • 7Hammami M,Chahir Y,Chen L. WebGuard:A Web filtering engine combining textual,structural,and visual content-based analysis[J].IEEE Transactions on Knowledge and Data Engineering,2006,(02):272-284.
  • 8Liu Y,Gap B,Liu T. BrowseRank:letting Web users vote for page importance[A].New York:ACM,2008.451-458.
  • 9Bilenko M,White R W. Mining the search trails of surfing crowds:Identifying relevant websites from user activity[A].New York:ACM,2008.51-60.
  • 10Liu Y,Zhang M,Ma S. User browsing graph:Structure,evolution and application[A].New York:ACM,2009.1-4.

二级参考文献24

  • 1余慧佳,刘奕群,张敏,茹立云,马少平.基于大规模日志分析的搜索引擎用户行为分析[J].中文信息学报,2007,21(1):109-114. 被引量:117
  • 2余慧佳,刘奕群,张敏,等.基于大规模日志分析的网络搜索引擎用户行为研究[C]//第三届学生计算语言学研讨会.沈阳:[出版者不详],2006.
  • 3赛迪网.2007中国搜索引擎市场研究专题报告[OL].[2007-11].http://www.sowang.com/news/200711161.htm.
  • 4中国互联网络信息中心.第21次中国互联网络发展状况统计报告[OL].[200801].http://www.cnnic.net.cn/uploadfiles/doc/2008/1/17/104126.doc.
  • 5Animesh A, Vandana R, Siva V. An Empirical Investigation of the Performance of Online Sponsored Search Markets[C]//ICEC'07, 2007: 153-160.
  • 6Anindya G, Sha Y. An Empirical Analysis of Sponsored Search Performance in Search Engine Advertising [R]. NET Institute Working Paper, 2007: 7-35.
  • 7Bilenko, M. and White, R. W. Mining the search trails of surfing crowds., identifying relevant websites from user activity[C]//Proceeding of the 17th interna tional Conference on World Wide Web (Beijing, Chi na, April 21-25, 2008). WWW '08. ACM, New York, NY: 51-60.
  • 8Bernard J. The Comparative Effectiveness of Sponsored and Nonsponsored Links for Web E commerce Queries[J]. ACM Transactions on the Web, 2007, Vol. 1, Article 3.
  • 9[1]Uri Hanani. Information filtering: overview of issues, research and systems [J]. User Modeling and User-Adapted Interaction, 2001, (11 ): 203 - 259.
  • 10[2]Belkin N J, Croft W B. Information filtering and information retrieval: two sides of the same coin? [J].Communications of the ACM, 1992, 35 (12): 29 -37.

共引文献27

同被引文献33

  • 1孟红艳.网络赌博犯罪案件调查分析[J].人民检察,2014(11):54-56. 被引量:8
  • 2王玉叶,胡燕飞.防治网络赌博犯罪的理性思考[J].江西社会科学,2005,25(12):189-192. 被引量:8
  • 3余慧佳,刘奕群,张敏,茹立云,马少平.基于大规模日志分析的搜索引擎用户行为分析[J].中文信息学报,2007,21(1):109-114. 被引量:117
  • 4朱艳春,刘鲁,张巍.在线信誉系统中的信任模型构建研究[J].控制与决策,2007,22(4):413-417. 被引量:24
  • 5Li X,Wang Y,Shen D.Learning with click graph for query intent classification[J].ACM Transactions on Information Systems,2010,28 (3):121-140.
  • 6Gonzalez C,Beaza Yates R.A multi-faceted approach to query intent classification[C]//Proceedings of the 18th Intemational Conference on String Processing and Information Retrieval.Berlin:Springer Berlin Heidelberg,2011:368-379.
  • 7Mauro Rojas Herrera,Edleno Sivade Moura,Marco Cristo,et al.Exploring features for the automatic identification of user goals in Web search[J].Information Processing and Management,2010,46 (2):231-142.
  • 8Duan R,Wang X,Hu R.Dependency relation based detection of lexicalized user goals[G].LNCS 6406:Ubiquitous Intelligence and Computing.Berlin:Springer Berlin Heidelberg,2010:167-168.
  • 9Truran M,Schrnakeit J,Ashman H.The effect of user intent on the stability of search engine results[J].Journal of the American Society for Information Science and Technology,2011,62 (7):1276-1287.
  • 10Radlinski F,Szummer M,Craswell N.Inferring query intent from reformulations and dicks[C]//Proceedings of the 19th International Conference on World Wide Web,2010:1171-1172.

引证文献5

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部