期刊文献+

基于逐点互信息的查询结构分析 被引量:3

Query Structure Analysis Based on PMI
下载PDF
导出
摘要 Web搜索引擎中,对用户查询结构的有效分析,能更好地理解用户的查询意图,促进检索效果的提升。该文提出了一种简单高效的基于逐点互信息的查询结构分析方法,该方法包含了基于MapReduce的离线训练算法,以及一种自下向上的在线查询树构建算法。实验显示,该方法具有很高的切分速度,并能取得不错的可比较的切分效果。进一步的,该方法对检索性能的提升,也有明显的促进作用,在MAP,p@5,p@10评价指标上,都取得了不错的性能提升。 The effective analysis of user query structure is helpful for understanding the user's intent and promoting performance of the Web search engine. This paper proposes a straightforward and effective analysis method for user query structure based on PMI (pointwise mutual information). The method contains an off-line training algorithm based on MapReduce and a bottom-up online building method for query analysis. The experiment result shows that our approach possesses a high segmentation speed while maintain a comparable segmentation performance to other approaches. The experiment on TREC WT10g dataset further validates the effectiveness of our method and shows that it can prompt the search results in terms of MAP, p@5, p@10.
出处 《中文信息学报》 CSCD 北大核心 2012年第5期33-39,共7页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(60903139 60873243 60933005) 国家863计划重点项目(2010AA012502 2010AA012503)
关键词 查询结构分析 MAPREDUCE 在线查询树 query structure analysis MapReduce~ online query analysis tree
  • 相关文献

参考文献13

  • 1T. Tao, C. Zhai. An exploration of proximity meas ures in information retrieval[C]//Proceedings of SI- GIR'07:295-302.
  • 2J. Bai, Y. Chang, H. Cui, et al. Investigation of par- tial query proximity in web search[C]//Proceedings of 17th International Conference on World Wide Web, 2008: 1183-1184.
  • 3Huang J., Gao J., Miao J., et al. Exploring web scale language models for search query processing [C]//Proceedings of WWW 2010.
  • 4R. Jones, B. Rey, O. Madani, etal. Generating que- ry substitutions[C]//Proceedings of 15th World Wide Web, 2006:387-396.
  • 5G. Kumaran, V. R. Carvalho. Reducing long queries using query quality predictors[C]//Proceedings of SI- GIR'09, 2009: 564-571.
  • 6D. Metzler, W. B. Croft. A markov random field model for term dependencies[C]//Proceedings of SI- GIR'05, 2005: 472-479.
  • 7K. M. Risvik, T. Mikolajewski, P. Boros. Query segmentation for Web search [C]//Proceedings of WWW 2003.
  • 8S. Bergsma, Q. I. Wang. Learning noun phrase query segmentation [ C ]//Proceedings of EMNLP-CoNLL 2007 : 819-826.
  • 9B. Tan, F. Peng. Unsupervised query segmentation using generative language models and Wikipedia[C]// Proceedings of WWW 2008: 347-356.
  • 10M. Hagen, M. Potthast, B. Stein, et al. The power of naive query segmentation[C]//Proceedings of SI GIR '10, 2010: 797 798.

二级参考文献5

共引文献16

同被引文献36

  • 1骆卫华,于满泉,许洪波,王斌,程学旗.基于多策略优化的分治多层聚类算法的话题发现研究[J].中文信息学报,2006,20(1):29-36. 被引量:38
  • 2于满泉,骆卫华,许洪波,白硕.话题识别与跟踪中的层次化话题识别技术研究[J].计算机研究与发展,2006,43(3):489-495. 被引量:49
  • 3王会珍,朱靖波,季铎,叶娜,张斌.基于反馈学习自适应的中文话题追踪[J].中文信息学报,2006,20(3):92-98. 被引量:17
  • 4赵华,赵铁军,张姝,王浩畅.基于内容分析的话题检测研究[J].哈尔滨工业大学学报,2006,38(10):1740-1743. 被引量:20
  • 5ALLAN J. Topic detection and tracking: event-based in- formation organization [ M ]. Dordrecht: Kluwer Academ- ic Publishers, 2002.
  • 6ZHENG W, ZHANG Y, HONG Y, et al. Topic tracking based on keywords dependency profile [ C ]//Proceedings of the 4th Asia Information Retrieval Symposium. Heidel- berg: Springer, 2008 : 129-140.
  • 7ZHU M L, HU W M, WU O. Topic detection and track- ing for threaded discussion communities [ C ]//Proceed- ings of the 2008 IEEE/WlC/ACM International Confer- ences on Web Intelligence and Intelligent Agent Technolo- gy. Washington: /EEE Computer Society, 2008:77-83.
  • 8LI Z W, WANG B, LI M J, et al. A probabilistic model for retrospective news event detection [ C ]//Proceedings of the 28th annual International ACM SIGIR. Salvador: Association for Computing Machinery, 2005 : 106-113.
  • 9ZHANG K, ZI J, WU L G. New event detection based on indexing-tree and named entity[ C ].//Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. New York: Association for Computing Machinery, 2007 : 215- 222.
  • 10YAMRON J P, KNECHT S, MULBREGT P V. Drag- on's tracking and detection systems for the TDT2000 evaluation [ C ]//Processdings of Topic Detection and Tracking Workshop. Gaithersburg: Springer, 2000 : 75- 80.

引证文献3

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部