期刊文献+

基于Hadoop的搜索引擎用户行为分析 被引量:21

An Analysis of the Search Engine User Behaviors Based on Hadoop
下载PDF
导出
摘要 搜索引擎用户行为分析是网络信息检索技术的研究热点。通过分析用户点击行为,利用Web数据挖掘技术获取有用信息,提高搜索引擎的检索算法和检索服务的效率,把用户从大量无序的搜索结果中解放出来。本文针对传统并行计算模型在易扩展和易编程方面遇到的瓶颈,给出一种基于Ha-doop的海量日志数据处理模型,通过基于Hadoop的分布式文件系统HDFS与MapReduce并行计算模型提高系统扩展性和易编程性,并应用该模型分析了Sogou搜索引擎一个月内约2 200万条查询日志,分析结果对于掌握用户搜索行为,评测及改进搜索引擎检索、排序算法等均有较好的指导意义。 Search engine user behaviors analysis is a focus of network information retrieval.It is a method of analyzing the user's behaviors through clicks to mine useful information to improve search engine's efficiency and retrieval services.In face of easy expansion and programming bottlenecks in traditional parallel computation models,a massive log data processing model based on Hadoop is given,which improves scalability and easy programming through Hadoop Distributed File System and MapReduce.Moreover,the experiment of analyzing 22 million query logs of the Sogou search engine in one month is carried out based on this model.The analysis result is instructive and meaningful to mastering the user's behaviors,evaluating and improving the searching and sorting algorithms.
作者 王振宇 郭力
出处 《计算机工程与科学》 CSCD 北大核心 2011年第4期115-120,共6页 Computer Engineering & Science
基金 广东省科技计划资助项目(2007B01020049)
关键词 HADOOP 分布式计算 用户行为分析 海量数据 Hadoop distributed computing user behavior analysis massive data
  • 相关文献

参考文献17

  • 1Page L, Brin S, Motwani R, et al. The Pagerank Citation Ranking: Bringing Order to the Web[R]. Technical Report, Stanford Digital Library Technologies Project, 1998.
  • 2Kleinberg J M. Authoritative Sources in a Hyperlinked Environment[J]. Journal of the ACM, 1999, 46(5) :604-632.
  • 3Chakrabarti S, Dom B, Raghavan P, et al. Automatic Re source List Compilation by Analyzing Hyperlink Structure and Associated Text[EB/OL]. [2009-11-17]. http://citese er. ist. psu. edu/chakrabarti98automalie. html.
  • 4Culliss G. User Popularity Ranked Search Engine [EB/OL]. [2009-11-17]. http://www. infonortics. com/searchengines/ bostonl999/culliss/index. htm.
  • 5PoweredBy Hadoop Wiki [EB/OL]. [2009-11-17]. http:// wiki. apache. org/hadoop/PoweredBy.
  • 6Borthakur D. HDFS Architccture[EB/OL]. [2009-11-17]. http://hadoop. apache. org/common/docs/current/hdfs_design.pdf.
  • 7Dean J, Ghemawat S. MapReduce: Simplified Data Process ing on Large Clustcrs[J]. Communications of the ACM, 2005,51(1):107-113.
  • 8Map/Reduce Tutorial [EH/OL]. [2009-11-17]. http://ha ctoop. apache. org/common/docs/current/mapred_ tutorial.pdf.
  • 9搜狗实验室(Sogou Labs)[EB/OL].[2009-11-17].http://www.sogou.com/tabs/.
  • 10Silverstem C, Henzinger M, Marais H, et al. Analysis of a Very Large Web Search Engine Query Log[J]. InSIGIR Forum, 1998,33(1):6-12.

同被引文献141

引证文献21

二级引证文献63

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部