期刊文献+

搜索引擎用户点击行为分析 被引量:45

User Behavior Analysis for a Large-scale Search Engine
下载PDF
导出
摘要 基于大规模分布式搜索引擎系统——北大“天网”的用户点击记录,本文研究发现:用户点击不同URL的数量遵从Heaps定律,点击URL的频度频级服从类Zipf分布,点击URL与页面大小相关,点击URL具有时间局部性,其顺序具有自相似性特征等一些具有普适性的规律。提出了利用点击日志确定相近查询词的一个新的有效算法。这些研究结果对于掌握用户的搜索行为,完善搜索引擎系统的设计,提高检索服务的效率和质量具有重要的意义。 Tianwang Search Engine is a large-scale search engine system which is now maintaining index of about 240 millions web pages and 20 millions ftp files. In this paper, we analyze the eliekthrough data in the click log of the WWW search service of Tianwang. The results show that the number of unique URLs selected by users conforms to Heaps law, and the popularity versus rank for the URLs selected by users is well fit by a Zipf-like distribution. The frequency of the URLs selected by users is correlated to their page size. The clicking of URLs also present high degree of locality. For a given query, a new and effective algorithm is presented to find the related queries. All these research results are very important to improve the effectiveness and efficiency of the search engine system and to the research on the search behavior of the users.
作者 王继民 彭波
出处 《情报学报》 CSSCI 北大核心 2006年第2期154-162,共9页 Journal of the China Society for Scientific and Technical Information
基金 国家自然科学基金重点项目(60435020) 教育部博士点基金项目(20030001076) 中国博士后科学基金项目(2004036182).
关键词 搜索引擎 点击日志 用户行为 分布特征 相似查询 search engine, click log, user behavior, characteristic distribution, similar query.
  • 相关文献

参考文献11

  • 1中国互联网络信息中心 (China Internet Network Information Center,CNNIC),http://www.cnnic.net.cn/
  • 2Baldi P,Frasconi P,Smyth P.Modeling the Internet and the Web,probabilistic methods and algorithms.England:John Wiley,2003
  • 3王建勇,单松巍,雷鸣,谢正茂,李晓明.Web search engine:characteristics of user behaviors and their implication[J].Science in China(Series F),2001,44(5):351-365. 被引量:4
  • 4Xie Yinglian,O'Hallaron D.Locality in search engine queries and its implications for caching.In:Proc.IEEE Infocom.2002
  • 5Silverstein C,Henzinger M,Marais H,et al.Analysis of a very large AltaVista query log.SRC Technical Note,1998-016,1998
  • 6Spink A,Wolfram D,Jansen B J,et al.Searching the web:The public and their queries.Journal of the American Society for Information Science,2001,53 (2):226~234
  • 7北大天网搜索引擎(Tianwang Search Engine).http://e.pku.edu.cn
  • 8Cho J.Crawling the Web:Discovery and Maintenance of a Large-Scale Web Data.[Ph.D.dissertation],Stanford University,2001
  • 9中国Web信息博物馆(Chinese Web Infomall.http://www.infomall.cn/
  • 10Beeferman D,Berger A.Agglomerative clustering of a search engine query log.In:Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2000,407~416

共引文献3

同被引文献561

引证文献45

二级引证文献241

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部