摘要
为提高检索性能,提出将基于高斯分布-指数分布混合模型的融合方法应用于分布式检索系统的多站点融合。该方法利用高斯密度函数和指数密度函数分别描述站点检索结果集合的相关文档和非相关文档的相关分值分布,并用基于混合模型的方法对相关分值进行规范化处理,然后对规范化处理后的相关分值进行合并。该融合方法考虑到了相关文档和非相关文档在分值分布上的差异,使计算出的相关分值更加准确,而且可以为性能比较好的站点分配更高的权重值,以提高整个系统的平均查准率。实验结果表明该方法优于其它融合方法。
In order to increase the retrieval performance,the fusion method based on the mixture mode of Gaussian distribution and exponential distribution is used to combine multi-sites of the distributed retrieval system.It describes the relevance score distribution of the relevant and non-relevant document respectively using the Gaussian density function and the exponential density function.Based on the mixture model,the relevance scores of documents are normalized and combined,The difference of the relevance score distribution between relevant and non-relevant documents is considered in the fusion method,so the relevance score can be counted precisely.A greater weighting can be assigned to the better performance site to increase the retrieval average precision.The experimental results indicate that the mixture fusion method has better performance than other fusion methods.
出处
《计算机工程与应用》
CSCD
北大核心
2008年第1期155-158,共4页
Computer Engineering and Applications
基金
国家自然科学基金(the National Natural Science Foundation of China under Grant No.60475021)
河南省自然科学基金(the Natural Sci-ence Foundation of Henan Province of China under Grant No.2007520013)。
关键词
相关分值
混合模型
多站点融合
relevance score
mixture model
multi-sites fusion