基于混合模型的多搜索引擎融合被引量：1

Multi-Engine Fusion Based on Mixture Model

下载PDF

导出

摘要为提高组合检索系统的性能,提出一种基于混合模型的多搜索引擎融合方法.该方法利用高斯、指数密度函数分别描述相关、非相关文档的相关分值分布,用基于混合模型的算法规范化处理相关分值,估计非相关文档的相关分值,并进行分值合并.这样做既考虑到相关、非相关文档在分值分布上的差异, 又考虑了用户对成员搜索引擎的性能评价.实验结果表明,利用该方法的平均查准率要比成员搜索引擎平均提高37 8%,也明显高于Sum CombSUM、Sum CombMNZ和Standard CombSUM 3种常用的融合方法. In order to increase the performance of the combined retrieval system, a multi-engine fusion method based on a mixture model was presented. The method describes the relevant score distribution of the relevant and non-relevant documents using Gaussian density function and exponential density function respectively. Based on the algorithm of the mixture model the relevant scores are normalized, the scores of non-relevant documents are estimated and combined, which consider both the difference between relevant and non-relevant documents in the score distribution and the retrieval performances of the member search engine estimated by users. Experimental results show that the average search accuracy is improved by 37.8% compared with member engines, and also higher than three often used fusion methods of Sum-CombSUM, Sum-CombMNZ, and Standard-CombSUM.

作者霍华冯博琴

机构地区西安交通大学电子与信息工程学院

出处《西安交通大学学报》 EI CAS CSCD 北大核心 2005年第4期356-359,共4页 Journal of Xi'an Jiaotong University

基金国家高技术研究发展计划资助项目(2003AA1Z2610).

关键词相关分值混合模型搜索引擎融合分值合并 Computer simulation Iterative methods Maximum likelihood estimation Normal distribution Parameter estimation

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献7

1向日华,王润生.一种基于高斯混合模型的距离图像分割算法[J].软件学报,2003,14(7):1250-1257. 被引量：54
2Savoy J. Combining multiple strategies for effective monolingual and cross-language retrieval [J]. Information Retrieval, 2004, 7(1): 121-148.
3Montague M, Aslam J. Relevance score normalization for metasearch[A]. The ACM Tenth International Conference on Information and Knowledge Management, Atlanta, USA, 2001.
4Sever H, Tolun M R. Comparison of normalization techniques for metasearch [A]. Advances in Information Systems (ADVIS), Izmir, Turkey, 2002.
5Manmatha R, Rath T, Feng F. Modeling score distributions for combining the outputs of search engines [A]. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, USA, 2001.
6McLachlan G, Peel D. Finite mixture models [M]. New York: John Wiley and Sons Inc, 2001. 40-51.
7Arampatzis A, van Hameren A. Maximum likelihood estimation for filtering thresholds [A]. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, USA, 2001.

二级参考文献9

1Jiang XY, Bunke H. Edge detection in range images based on scan line approximation. Computer Vision and Image Understanding,1999,73(2): 183~ 199.
2Hoover A, Jean-Baptiste G, Jiang XY, Flynn PJ, Bunke H, Goldgof DB, Bowyer K, Eggert DW, Fitzgibbon A, Fisher RB. An experimental comparison of range image segmentation algorithms. IEEE Transactions on PAMI, 1996,18(7):673--689.
3Hoffman R, Jain AK. Segment and classification of range images. IEEE Transactions on PAMI, 1996,9(5):608---620.
4Bihnes JA. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. 1998. http://ssli.ee.washington.edu/people/bihnes/mypapers/em.ps.gz.
5Redner RA, Walker HF. Mixture density, maximum likelihood and the EM algorithm. SIAM Review, 1984,26(2):195~239.
6Hoover A, Powell MW. Range image segmentation comparison project. Department of Computer Science and Engineering,University of South Florida, 1996. http://marathon.csee.usf.edu/range/seg-comp/SegComp.html.
7Raflery AE. Approximate Bayes factors and accounting for model uncertainty in generalizes linear model. Technical Report, 1993.http://www.stat.washington.edu/www/research/reports/1993/tr255 .ps.
8Fraley C, Raftery AE. How many clusters? Which clustering method? Answers via model-based cluster analysis. Technical Report,1998. http://www.stat.washington.edu/www/research/reports/1998/tr329.ps.
9Buhmann/M. Data clustering and learning. 2002. http://www-dbv.cs.uni-bonn.de,/pdf/buhmann.hobtann02.pdf.

共引文献53

1罗德安,廖丽琼,丁岩辉.基于不完整点云数据的3D柱状模型快速重建[J].测绘科学,2006,31(4):100-102. 被引量：2
2罗德安,廖丽琼.基于四叉树划分的地面激光雷达数据简化[J].计算机应用,2005,25(2):420-421. 被引量：4
3刘聪,张建伟,江志红.基于多元信息的高斯混合模型左心室MR图像分割[J].计算机工程与应用,2005,41(11):18-21. 被引量：2
4陈付幸,王润生.基础矩阵估计的聚类分析算法[J].计算机辅助设计与图形学学报,2005,17(10):2251-2256. 被引量：9
5任厚平,张永明,张维农,袁非牛,余春雨.基于混合高斯模型定位的火灾烟雾纹理特征提取[J].微计算机信息,2005,21(11S):83-85. 被引量：7
6张建伟,夏德深.高斯混合模型改进的活动轮廓模型MRI分割[J].计算机辅助设计与图形学学报,2005,17(12):2647-2653. 被引量：12
7侯一民,郭雷.一种鲁棒的MRF-MAP图象分割框架研究[J].计算机工程与应用,2006,42(27):62-64.
8施智平,胡宏,李清勇,史俊,史忠植.视频数据库的聚类索引方法[J].计算机学报,2007,30(3):397-404. 被引量：6
9朱文球,刘强.一种新的图像语义自动标注与检索算法[J].计算机应用研究,2007,24(7):318-320. 被引量：6
10高敬惠,李玉海,刘国丽.基于期望最大理论的无监督图像分割[J].微计算机信息,2007(24):309-310.

同被引文献10

1Cacheda F,Plachouras V,Ounis I.A case study of distributed information retrieval architectures to index one terabyte of text[J].Information Processing & Management,2005,41 (5):1141-1161.
2Croft W B.Combining approaches to information retrieval[M]//Croft W B.Advances in Information Retrieval.[S.l.]:Kluwer Academic Publishers,2002:1-36.
3Montague M,Aslam J.Relevance score normalization for metasearch[C]//the Proc of the ACM Tenth International Conference on Information and Knowledge Management,2001,11:427-433.
4Manmatha R,Rath T,Feng F.Modeling score distributions for combining the outputs of search engines[C]//the Proc of 24th ACM SIGIR Conf on Research and Development in Information Retrieval,2001,9:267-275.
5Sever H,Tolun M R.Comparison of normalization techniques for metasearch[C]//Yakhno T.LNCS 2457:ADVIS 2002:133-143.
6Mclachlan G,Peel D.Finite mixture models[M].New York:John Wiley & Sons,Inc,2001:40-51.
7Dankmar B,Seidel W,Garel B.Advances in mixture models[J].Computational Statistics & Data Analysis,2006,11:151-159.
8Arampatzis A,van Hameren A.Maximum likelihood estimation for filtering thresholds[C]//the Proc of the 24th ACM SIGR Conf on Research and Development in Information Retrieval,Sept 2001:185-293.
9Si L,Callen J.A semisupervised learning method to merge search engine results[J].ACM Transactions on Information Systems,2003,21(4):457-491.
10向日华,王润生.一种基于高斯混合模型的距离图像分割算法[J].软件学报,2003,14(7):1250-1257. 被引量：54

引证文献1

1刘俊强,苗克坚,霍华.分布式检索系统中基于混合模型的多站点融合[J].计算机工程与应用,2008,44(1):155-158.

1WANG.增加BT搜索引擎，让IE更智能[J].计算机应用文摘,2007(11X):108-108.
2邹华军,张爱强,曾育星.基于网络编程技术实现INTERNET上多搜索引擎信息的获取[J].电脑编程技巧与维护,1999(6):40-42. 被引量：1
3郑瑾,王斌,陈松乔.Java Bean构件搜索引擎[J].计算机工程,2003,29(20):45-46.
4邹华军,张爱强,曾育星.基于网络编程技术实现Internet上多搜索引擎信息的获取[J].微型机与应用,1999,18(9):30-32. 被引量：1
5王江涤.多搜索引擎的设计与实现[J].哈尔滨理工大学学报,2004,9(3):125-127. 被引量：1
6胡晟,季志远,程晓荣.基于数据挖掘的主题种子站点提取器的研究[J].软件,2013,34(2):56-57. 被引量：6
7赵贻竹,鲁宏伟,郭俊甫.Google硬件体系结构分析[J].计算机工程与科学,2007,29(9):45-48.
8引火虫.为IE8 Beta2添加自己喜爱的搜索引擎[J].电脑迷,2008,0(19):69-69.
9Chrome浏览器怎么用？[J].计算机应用文摘,2009(10):58-58.
10王临.借助百度浏览器以图搜图[J].电脑迷,2013(4):87-87.

西安交通大学学报

2005年第4期

浏览历史

内容加载中请稍等...

基于混合模型的多搜索引擎融合被引量：1

参考文献7

二级参考文献9

共引文献53

同被引文献10

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于混合模型的多搜索引擎融合 被引量：1

参考文献7

二级参考文献9

共引文献53

同被引文献10

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于混合模型的多搜索引擎融合被引量：1