期刊文献+

基于混合模型的多搜索引擎融合 被引量:1

Multi-Engine Fusion Based on Mixture Model
下载PDF
导出
摘要 为提高组合检索系统的性能,提出一种基于混合模型的多搜索引擎融合方法.该方法利用高斯、指数密度函数分别描述相关、非相关文档的相关分值分布,用基于混合模型的算法规范化处理相关分值,估计非相关文档的相关分值,并进行分值合并.这样做既考虑到相关、非相关文档在分值分布上的差异, 又考虑了用户对成员搜索引擎的性能评价.实验结果表明,利用该方法的平均查准率要比成员搜索引擎平均提高37 8%,也明显高于Sum CombSUM、Sum CombMNZ和Standard CombSUM 3种常用的融合方法. In order to increase the performance of the combined retrieval system, a multi-engine fusion method based on a mixture model was presented. The method describes the relevant score distribution of the relevant and non-relevant documents using Gaussian density function and exponential density function respectively. Based on the algorithm of the mixture model the relevant scores are normalized, the scores of non-relevant documents are estimated and combined, which consider both the difference between relevant and non-relevant documents in the score distribution and the retrieval performances of the member search engine estimated by users. Experimental results show that the average search accuracy is improved by 37.8% compared with member engines, and also higher than three often used fusion methods of Sum-CombSUM, Sum-CombMNZ, and Standard-CombSUM.
作者 霍华 冯博琴
出处 《西安交通大学学报》 EI CAS CSCD 北大核心 2005年第4期356-359,共4页 Journal of Xi'an Jiaotong University
基金 国家高技术研究发展计划资助项目(2003AA1Z2610).
关键词 相关分值 混合模型 搜索引擎融合 分值合并 Computer simulation Iterative methods Maximum likelihood estimation Normal distribution Parameter estimation
  • 相关文献

参考文献7

  • 1向日华,王润生.一种基于高斯混合模型的距离图像分割算法[J].软件学报,2003,14(7):1250-1257. 被引量:54
  • 2Savoy J. Combining multiple strategies for effective monolingual and cross-language retrieval [J]. Information Retrieval, 2004, 7(1): 121-148.
  • 3Montague M, Aslam J. Relevance score normalization for metasearch[A]. The ACM Tenth International Conference on Information and Knowledge Management, Atlanta, USA, 2001.
  • 4Sever H, Tolun M R. Comparison of normalization techniques for metasearch [A]. Advances in Information Systems (ADVIS), Izmir, Turkey, 2002.
  • 5Manmatha R, Rath T, Feng F. Modeling score distributions for combining the outputs of search engines [A]. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, USA, 2001.
  • 6McLachlan G, Peel D. Finite mixture models [M]. New York: John Wiley and Sons Inc, 2001. 40-51.
  • 7Arampatzis A, van Hameren A. Maximum likelihood estimation for filtering thresholds [A]. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, USA, 2001.

二级参考文献9

  • 1Jiang XY, Bunke H. Edge detection in range images based on scan line approximation. Computer Vision and Image Understanding,1999,73(2): 183~ 199.
  • 2Hoover A, Jean-Baptiste G, Jiang XY, Flynn PJ, Bunke H, Goldgof DB, Bowyer K, Eggert DW, Fitzgibbon A, Fisher RB. An experimental comparison of range image segmentation algorithms. IEEE Transactions on PAMI, 1996,18(7):673--689.
  • 3Hoffman R, Jain AK. Segment and classification of range images. IEEE Transactions on PAMI, 1996,9(5):608---620.
  • 4Bihnes JA. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. 1998. http://ssli.ee.washington.edu/people/bihnes/mypapers/em.ps.gz.
  • 5Redner RA, Walker HF. Mixture density, maximum likelihood and the EM algorithm. SIAM Review, 1984,26(2):195~239.
  • 6Hoover A, Powell MW. Range image segmentation comparison project. Department of Computer Science and Engineering,University of South Florida, 1996. http://marathon.csee.usf.edu/range/seg-comp/SegComp.html.
  • 7Raflery AE. Approximate Bayes factors and accounting for model uncertainty in generalizes linear model. Technical Report, 1993.http://www.stat.washington.edu/www/research/reports/1993/tr255 .ps.
  • 8Fraley C, Raftery AE. How many clusters? Which clustering method? Answers via model-based cluster analysis. Technical Report,1998. http://www.stat.washington.edu/www/research/reports/1998/tr329.ps.
  • 9Buhmann/M. Data clustering and learning. 2002. http://www-dbv.cs.uni-bonn.de,/pdf/buhmann.hobtann02.pdf.

共引文献53

同被引文献10

  • 1Cacheda F,Plachouras V,Ounis I.A case study of distributed information retrieval architectures to index one terabyte of text[J].Information Processing & Management,2005,41 (5):1141-1161.
  • 2Croft W B.Combining approaches to information retrieval[M]//Croft W B.Advances in Information Retrieval.[S.l.]:Kluwer Academic Publishers,2002:1-36.
  • 3Montague M,Aslam J.Relevance score normalization for metasearch[C]//the Proc of the ACM Tenth International Conference on Information and Knowledge Management,2001,11:427-433.
  • 4Manmatha R,Rath T,Feng F.Modeling score distributions for combining the outputs of search engines[C]//the Proc of 24th ACM SIGIR Conf on Research and Development in Information Retrieval,2001,9:267-275.
  • 5Sever H,Tolun M R.Comparison of normalization techniques for metasearch[C]//Yakhno T.LNCS 2457:ADVIS 2002:133-143.
  • 6Mclachlan G,Peel D.Finite mixture models[M].New York:John Wiley & Sons,Inc,2001:40-51.
  • 7Dankmar B,Seidel W,Garel B.Advances in mixture models[J].Computational Statistics & Data Analysis,2006,11:151-159.
  • 8Arampatzis A,van Hameren A.Maximum likelihood estimation for filtering thresholds[C]//the Proc of the 24th ACM SIGR Conf on Research and Development in Information Retrieval,Sept 2001:185-293.
  • 9Si L,Callen J.A semisupervised learning method to merge search engine results[J].ACM Transactions on Information Systems,2003,21(4):457-491.
  • 10向日华,王润生.一种基于高斯混合模型的距离图像分割算法[J].软件学报,2003,14(7):1250-1257. 被引量:54

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部