基于翻译模型的查询会话检测方法研究被引量：1

A Translation Model Based Method for Query Session Detection

下载PDF

导出

摘要查询会话检测的目的是确定用户为了满足某个特定需求而连续提交的相关查询。查询会话检测对于查询日志分析以及用户行为分析来说是非常有用的。传统的查询会话检测方法大都基于查询词的比较,无法解决词语不匹配问题(vocabulary-mismatch problem)——有些主题相关的查询之间并没有相同的词语。为了解决词语不匹配问题,我们在该文提出了一种基于翻译模型的查询会话检测方法,该方法将词与词之间的关系刻画为词与词之间的翻译概率,这样即使词与词之间没有相同的词语,我们也可以捕捉到它们之间的语义关系。同时,我们也提出了两种从查询日志中估计词翻译概率的方法,第一种方法基于查询的时间间隔,第二种方法基于查询的点击URLs。实验结果证明了该方法的有效性。 Query session detection is critical for query log analysis and user behavior characterization. It aims at iden- tifying the consecutive queries submitted by a user for the same information need. Traditional query session detection methods are based on lexical comparisons, which often suffer from the vocabulary-mismatch problem（i, e, the topi- cally related queries may not share any common words）. To resolve the issue, this paper proposes a translation model based method for query session detection, which can model the relationship between words as word transla- tion probability. In this way our method can capture the relatedness between queries even they do not share any com- mon words. Furthermore, we also propose two approaches for generating training data from web query log for translation probability estimation. The first approach is based on time gap between queries and the second is based on the clicked URI.s of queries. Experimental results show that our method can significantly outperform the base lines.

作者张振中孙乐韩先培

机构地区中国科学院软件研究所基础软件中心

出处《中文信息学报》 CSCD 北大核心 2015年第4期95-102,共8页 Journal of Chinese Information Processing

基金国家自然科学基金(61433015 61272324) 国家高技术研究发展计划项目(2015AA015405)

关键词查询会话检测词语不匹配问题查询日志 query session detection vocabulary-mismatch problem query log

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献21

1Rosie Jones,Kristina L.Klinkner.Beyond the Session Timeout:Automatic Hierarchical Segmentation of Search Topics in Query Logs [C]// Proceedings of CIKM2008,2008:699-708.
2余慧佳,刘奕群,张敏,茹立云,马少平.基于大规模日志分析的搜索引擎用户行为分析[J].中文信息学报,2007,21(1):109-114. 被引量：117
3Bernard J.Jansen,Amanda Spink,Chris Blakely,et al.Defining a Session on Web Search Engines [J],Journal of the American Society for Information Science and Technology,2007,58(6):862-871.
4Paolo Boldi,Francesco Bonchi,Carlos Castillo,et al.The query-flow graph:model and applications [C] // Proceedings of CIKM2008,2008:609-618.
5Doug Downey,Susan Dumais,Eric Horvitz.Models of searching and browsing:languages,studies,and application [C] // Proceedings of IJCAI,2007:2740-2747.
6Daniel Gayo-Avello.A survey on session detection methods in query logs and a proposal for future evaluation [J].Information Sciences,2009,179(12):1822-1843.
7Matthias Hagen,Benno Stein,Tino Rüb.Query session detection as a cascade [C] // Proceedings of CIKM2011,2011:147-152.
8Daqing He,Ayse Gker.Detecting session boundaries from Web user logs [C] // Proceedings of the 22nd Annual Colloquium on Information Retrieval Research,2000:57-66.
9Daqing He,Ayse Gker,David J.Harper.Combining evidence for automatic Web session identification [J],Information Processing and Management,2002,38(5):727-742.
10张磊,李亚楠,王斌,李鹏,蒋在帆.网页搜索引擎查询日志的Session划分研究[J].中文信息学报,2009,23(2):54-61. 被引量：16

二级参考文献33

1余慧佳,刘奕群,张敏,茹立云,马少平.基于大规模日志分析的搜索引擎用户行为分析[J].中文信息学报,2007,21(1):109-114. 被引量：117
2Bin Tan, Fuchun Peng. Unsupervised query segmentation using generative language models and Wikipedia[C]//Proceeding of the 17th international conference on World Wide Web. Beijing, China, 2008:347-356.
3Craig Silverstein, Monika Henzinger, Hannes Marais, et al. Analysis of a very large Web search engine query log[J]. In SIGIR Forum, fall 1998, 33(1):6-12.
4Daqing He, Ays, e Goker. Detecting session boundaries from Web user logs[C]//Proceedings of the 22nd annual colloquium on information, 2000.
5H. Cenk Ozmutlu , Fatih cavdur, Application of automatic topic identification on excite web search engine data logs.[J]Information Processing and Management: an International Journal, 2005, 41(5) : 1243-1262.
6Jing Bai, Jian-Yun Nie, Guihong Cao, Hugues Bouchard. Using query contexts in information retrieval[J]. SIGIR'07, July 23-27, 2007.
7Jinhui Yuan, Huiyi Wang, Lan Xiao, Wujie Zheng, Jianmin Li, Fuzong Lin, and Bo Zhang. A Formal Study of Shot Boundary Detection. [C]//IEEE transactions on circuits and systems for video technology, VOL. 17, NO. 2, pp. 168-186. February 2007.
8Qingsong Yao, Xiangji Huang and Aijun An. Applying Language Modeling to Session Identification from Database Trace Logs[C]//Knowledge and Information Systems, 2006-Springer.
9S Ozmutlu, F Cavdur. Neural network applications for automatic new topic identification[J]. Online Information Review,2005, 29(1):34-53.
10Seda Ozmutlu, H. Cenk Ozmutlu, Amanda Spink. Automatic New Topic Identification in Search Engine Transaction Logs using Multiple Linear Regression [C]//Proceedings of the 41st Hawaii International Conference on System Sciences. 2008: 140.

共引文献124

1刘寒.混合采样与遗传算法相结合的垃圾网页检测[J].北京邮电大学学报,2019,42(6):111-117. 被引量：4
2黄志方,陆伟,彭玉,吴佳鑫.视线追踪技术在网络信息检索中的应用研究综述[J].信息资源管理学报,2011,1(2):95-100. 被引量：2
3马宏远,王斌.基于日志分析的搜索引擎查询结果缓存研究[J].计算机研究与发展,2012,49(S1):224-228. 被引量：3
4杨大全,王斓樾.利用服务器日志优化搜索引擎[J].沈阳工业大学学报,2008,30(1):94-97.
5陈红涛,杨放春,陈磊.基于大规模中文搜索引擎的搜索日志挖掘[J].计算机应用研究,2008,25(6):1663-1665. 被引量：16
6刘承启,邓庚盛,江婕,徐健锋.基于用户行为分析的搜索引擎研究[J].计算机与现代化,2008(9):75-77. 被引量：2
7刘奕群,岑荣伟,张敏,茹立云,马少平.基于用户行为分析的搜索引擎自动性能评价[J].软件学报,2008,19(11):3023-3032. 被引量：23
8夏翠军.CALIS重点学科网络资源导航库的使用情况分析[J].图书情报工作,2009,53(5):75-78. 被引量：9
9张磊,李亚楠,王斌,李鹏,蒋在帆.网页搜索引擎查询日志的Session划分研究[J].中文信息学报,2009,23(2):54-61. 被引量：16
10朱鲲鹏,刘文涵,王晓龙,刘远超.基于日志挖掘的检索推荐系统[J].沈阳建筑大学学报（自然科学版）,2009,25(2):366-370. 被引量：3

同被引文献8

1姜芳,李国和,岳翔.基于语义的文档关键词提取方法[J].计算机应用研究,2015,32(1):142-145. 被引量：10
2朱筠,晋耀红.面向汉英专利机器翻译的复杂谓语形态转换研究[J].语言文字应用,2015(1):127-135. 被引量：3
3宋柔,葛诗利.面向篇章机器翻译的英汉翻译单位和翻译模型研究[J].中文信息学报,2015,29(5):125-135. 被引量：15
4杨万春,张晨曦,穆斌.结合语义与事务属性的QoS感知的服务优化选择[J].计算机应用,2016,36(8):2207-2212. 被引量：2
5粟千.弱化语法规则下英文机器翻译的优化仿真[J].计算机仿真,2016,33(11):414-417. 被引量：12
6王少爽.职业化时代译者信息素养研究:需求分析、概念阐释与模型构建[J].外语界,2017(1):55-63. 被引量：67
7吴良刚,文丽.基于二维二元语义和模糊AHP-TODIM的低碳供应商选择方法[J].运筹与管理,2017,26(3):7-16. 被引量：14
8孙霞,沈韩.机器英语翻译中的模糊语义最优解选取方法[J].现代电子技术,2017,40(12):31-33. 被引量：5

引证文献1

1曾妍.机器英语翻译中的模糊语义最优解选取方法[J].现代电子技术,2018,41(2):156-158. 被引量：4

二级引证文献4

1张琛.机器英语翻译中的模糊语义最优解选取方法[J].英语广场（学术研究）,2018,0(12):43-44.
2帕提曼·吐尔逊,任艳.英语教学中的文本摘要提取算法[J].信息技术,2020,44(12):82-85. 被引量：2
3杨玉芳.基于双向关联规则的跨文化视角转换英语翻译技巧研究[J].湖北第二师范学院学报,2021,38(9):11-16. 被引量：4
4冷宁.英语语言文学中的模糊语义的翻译[J].课程教育研究,2019(21):106-106. 被引量：1

1马宪敏.基于hadoop的大规模查询日志分析模型设计[J].电子测试,2015,26(6):46-48. 被引量：1
2陈振宏,俞晓明,刘悦,程学旗.查询会话中带时间因子的隐式负反馈研究[J].中文信息学报,2016,30(2):113-120.
3翟海军,张刚,张瑾.基于线性回归的相关查询推荐[J].高技术通讯,2010,20(6):596-601.
4解萍,王伟.通用查询会话Bean的设计[J].淮南师范学院学报,2009,11(3):49-52.
5王晓春,李生,杨沐昀,赵铁军.查询会话中的用户行为分析[J].哈尔滨工业大学学报,2011,43(5):76-78. 被引量：3
6赵龙,江荣安.基于Hive的海量搜索日志分析系统研究[J].计算机应用研究,2013,30(11):3343-3345. 被引量：15
7付博,赵世奇,刘挺.Web查询日志研究综述[J].电子学报,2013,41(9):1800-1808. 被引量：8
8马宏远,王斌.基于日志分析的搜索引擎查询结果缓存研究[J].计算机研究与发展,2012,49(S1):224-228. 被引量：3
9宋海龄,文伟平.一个典型的Web安全评测工具的分析与改进[J].信息网络安全,2011(8):65-68.
10黄金春,梁正,杜娟娇.基于有序Sessionid模式的WEB集群系统的性能优化研究[J].信息系统工程,2015,28(12):37-39.

中文信息学报

2015年第4期

浏览历史

内容加载中请稍等...

基于翻译模型的查询会话检测方法研究被引量：1

参考文献21

二级参考文献33

共引文献124

同被引文献8

引证文献1

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

基于翻译模型的查询会话检测方法研究 被引量：1

参考文献21

二级参考文献33

共引文献124

同被引文献8

引证文献1

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

基于翻译模型的查询会话检测方法研究被引量：1