期刊文献+

基于翻译模型的查询会话检测方法研究 被引量:1

A Translation Model Based Method for Query Session Detection
下载PDF
导出
摘要 查询会话检测的目的是确定用户为了满足某个特定需求而连续提交的相关查询。查询会话检测对于查询日志分析以及用户行为分析来说是非常有用的。传统的查询会话检测方法大都基于查询词的比较,无法解决词语不匹配问题(vocabulary-mismatch problem)——有些主题相关的查询之间并没有相同的词语。为了解决词语不匹配问题,我们在该文提出了一种基于翻译模型的查询会话检测方法,该方法将词与词之间的关系刻画为词与词之间的翻译概率,这样即使词与词之间没有相同的词语,我们也可以捕捉到它们之间的语义关系。同时,我们也提出了两种从查询日志中估计词翻译概率的方法,第一种方法基于查询的时间间隔,第二种方法基于查询的点击URLs。实验结果证明了该方法的有效性。 Query session detection is critical for query log analysis and user behavior characterization. It aims at iden- tifying the consecutive queries submitted by a user for the same information need. Traditional query session detection methods are based on lexical comparisons, which often suffer from the vocabulary-mismatch problem(i, e, the topi- cally related queries may not share any common words). To resolve the issue, this paper proposes a translation model based method for query session detection, which can model the relationship between words as word transla- tion probability. In this way our method can capture the relatedness between queries even they do not share any com- mon words. Furthermore, we also propose two approaches for generating training data from web query log for translation probability estimation. The first approach is based on time gap between queries and the second is based on the clicked URI.s of queries. Experimental results show that our method can significantly outperform the base lines.
出处 《中文信息学报》 CSCD 北大核心 2015年第4期95-102,共8页 Journal of Chinese Information Processing
基金 国家自然科学基金(61433015 61272324) 国家高技术研究发展计划项目(2015AA015405)
关键词 查询会话检测 词语不匹配问题 查询日志 query session detection vocabulary-mismatch problem query log
  • 相关文献

参考文献21

  • 1Rosie Jones,Kristina L.Klinkner.Beyond the Session Timeout:Automatic Hierarchical Segmentation of Search Topics in Query Logs [C]// Proceedings of CIKM2008,2008:699-708.
  • 2余慧佳,刘奕群,张敏,茹立云,马少平.基于大规模日志分析的搜索引擎用户行为分析[J].中文信息学报,2007,21(1):109-114. 被引量:117
  • 3Bernard J.Jansen,Amanda Spink,Chris Blakely,et al.Defining a Session on Web Search Engines [J],Journal of the American Society for Information Science and Technology,2007,58(6):862-871.
  • 4Paolo Boldi,Francesco Bonchi,Carlos Castillo,et al.The query-flow graph:model and applications [C] // Proceedings of CIKM2008,2008:609-618.
  • 5Doug Downey,Susan Dumais,Eric Horvitz.Models of searching and browsing:languages,studies,and application [C] // Proceedings of IJCAI,2007:2740-2747.
  • 6Daniel Gayo-Avello.A survey on session detection methods in query logs and a proposal for future evaluation [J].Information Sciences,2009,179(12):1822-1843.
  • 7Matthias Hagen,Benno Stein,Tino Rüb.Query session detection as a cascade [C] // Proceedings of CIKM2011,2011:147-152.
  • 8Daqing He,Ayse Gker.Detecting session boundaries from Web user logs [C] // Proceedings of the 22nd Annual Colloquium on Information Retrieval Research,2000:57-66.
  • 9Daqing He,Ayse Gker,David J.Harper.Combining evidence for automatic Web session identification [J],Information Processing and Management,2002,38(5):727-742.
  • 10张磊,李亚楠,王斌,李鹏,蒋在帆.网页搜索引擎查询日志的Session划分研究[J].中文信息学报,2009,23(2):54-61. 被引量:16

二级参考文献33

  • 1余慧佳,刘奕群,张敏,茹立云,马少平.基于大规模日志分析的搜索引擎用户行为分析[J].中文信息学报,2007,21(1):109-114. 被引量:117
  • 2Bin Tan, Fuchun Peng. Unsupervised query segmentation using generative language models and Wikipedia[C]//Proceeding of the 17th international conference on World Wide Web. Beijing, China, 2008:347-356.
  • 3Craig Silverstein, Monika Henzinger, Hannes Marais, et al. Analysis of a very large Web search engine query log[J]. In SIGIR Forum, fall 1998, 33(1):6-12.
  • 4Daqing He, Ays, e Goker. Detecting session boundaries from Web user logs[C]//Proceedings of the 22nd annual colloquium on information, 2000.
  • 5H. Cenk Ozmutlu , Fatih cavdur, Application of automatic topic identification on excite web search engine data logs.[J]Information Processing and Management: an International Journal, 2005, 41(5) : 1243-1262.
  • 6Jing Bai, Jian-Yun Nie, Guihong Cao, Hugues Bouchard. Using query contexts in information retrieval[J]. SIGIR'07, July 23-27, 2007.
  • 7Jinhui Yuan, Huiyi Wang, Lan Xiao, Wujie Zheng, Jianmin Li, Fuzong Lin, and Bo Zhang. A Formal Study of Shot Boundary Detection. [C]//IEEE transactions on circuits and systems for video technology, VOL. 17, NO. 2, pp. 168-186. February 2007.
  • 8Qingsong Yao, Xiangji Huang and Aijun An. Applying Language Modeling to Session Identification from Database Trace Logs[C]//Knowledge and Information Systems, 2006-Springer.
  • 9S Ozmutlu, F Cavdur. Neural network applications for automatic new topic identification[J]. Online Information Review,2005, 29(1):34-53.
  • 10Seda Ozmutlu, H. Cenk Ozmutlu, Amanda Spink. Automatic New Topic Identification in Search Engine Transaction Logs using Multiple Linear Regression [C]//Proceedings of the 41st Hawaii International Conference on System Sciences. 2008: 140.

共引文献124

同被引文献8

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部