期刊文献+

基于数据融合的Web元搜索模型比较研究 被引量:2

Comparative Study of Web Meta-Search Engine Based on Data Fusion
下载PDF
导出
摘要 没有一个搜索引擎系统在任何情况下所表现出来的性能都比其他的搜索引擎要好,因此研究元搜索引擎是必要的。文中提出了三种元搜索中的传统数据融合方法:基于线性组合的相似度融合、基于排序的Unbiased和Biased-Bayes融合。其中相似度融合通过分析部分Web文档的内容来产生线性组合的参数,Unbiased则将各搜索引擎的结果表均衡地融合在一起,Biased-Bayes则利用了ODP的分类服务和Bayes概率模型来计算文档的相关度。通过实验证明它们是行之有效的融合方法,比较传统的方法的性能有一定提高,在效率上比纯粹分析所有文档的内容来进行融合的方法更好。 As no one research engine surpass any other search engines under all circumstances, and the "best" system for a particular task may not be known a priori. The Meta - search is an effective way to find relevant documents from the vast source of information in WWW. In this paper, three data fusion methods for the Meta - search have been presented: Similarity Linear Combination, Unbiased and Biased - Bayes. The Biased - Bayes use the ODP directory for priority calculation, and needs few training process. Comparing with other fusion methods, these methods promote the average precision evidently and steadily. They yield improvements in the effectiveness and the effectiveness is comparable to that of approach that analyzing the web documents.
作者 丁一 杨朋英
出处 《计算机仿真》 CSCD 2007年第4期120-123,共4页 Computer Simulation
基金 湖北师范学院资助科研项目(2006C10)
关键词 网上数据挖掘 网页搜索 信息检索 数据融合 Web Mining Web search information retrieval data fusion
  • 相关文献

参考文献8

  • 1Weiyi Meng,Clement Yu and King-Lup Liu.Building Efficient and Effective Metasearch Engines[J].ACM Computing Surveys,2002,34(1):48-89.
  • 2J A Aslam and M Montague.Bayes optimal metasearch:a probabilistic model for combining the results[C].In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Athens,Greece.2000.379-381.
  • 3R Manmatha,T Rath and F Feng.Modeling Score Distributions for Combining the Outputs of Search Engines[C].In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New Orleans,USA.2001.267-275.
  • 4L Si and J Callan.Using sampled data and regression to merge search engine results[C].In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Tampere,Finland.2002.19-26.
  • 5E A Fox and J A Shaw.Combination of multiple searches[C].In:D.Harman ed.Proc.of the 2nd Text Retrieval Conference (TREC-2).Gaithersburg,USA.1994.243-249.
  • 6Christopher C Vogt and Garrison W Cottrell.Predicting the Performance of Linearly Combined IR Systems[C].In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Melbourne,Austrailia.1998.190-196.
  • 7D D Lewis.Naive (Bayes) at Forty:The Independence Assumption in Information Retrieval[C].In:C.Nedellec and C.Rouveirol eds.Proceedings of the 10th European Conference on Machine Learning,LNAI 1398.Springer-Verlag,1998.4-18.
  • 8T H Haveliwala.Topic-sensitive PageRank[C].In the proceedings of the 11th International WWW conference (WWW '2002).Honolulu,USA.2002.432-442.

同被引文献10

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部