摘要
现有排序学习算法忽视了查询之间的差异,在建立排序模型的过程中等同对待训练样本集中的所有查询及其相关文档,影响了排序模型的性能.文中描述了查询之间的差异,并在训练过程中考虑查询之间的差异,提出了一种基于有监督学习的多排序模型融合方法.这种方法首先使用每一个查询及其相关文档训练出子排序模型,并将每一个子排序模型的输出转化为体现查询差异的特征数据,使用监督学习方法,实现了多排序模型的融合.更进一步,针对排序问题的特性,文中提出了一种直接优化排序性能的融合函数融合子排序模型,使用梯度上升方法优化其下界函数.文中证明了直接优化排序性能的融合函数融合子排序模型的性能优于子排序模型线性合并的性能.基于较大规模真实数据应用的实验结果表明,直接优化性能指标的多排序模型融合方法可以比传统排序学习模型具有更好的排序性能.
In ranking for document retrieval,queries often vary greatly from one to another.Most of the existing approaches treat the losses from different queries as the same.We find out that using a supervised rank aggregation function could further improve the ranking performance.In this paper,the differences among queries are taken into consideration,and a supervised rank aggregation framework based on query similarity is proposed.This approach sets up a number of base rankers based on each query and its relevant documents,and then employs a supervised aggregation function to train the weights for these base rankers.We propose an aggregation function which is directly optimizing performance measure NDCG,referred to as RankAgg.NDCG.We prove that RankAgg.NDCG can achieve better performance than the linear combination of the base rankers.Experimental results performed on real world datasets show our approach outperforms conventional ranking approaches.
出处
《计算机学报》
EI
CSCD
北大核心
2014年第8期1658-1668,共11页
Chinese Journal of Computers
基金
国家自然科学基金(60673009
61105049)
国家"八六三"高技术研究发展计划项目基金(2011AA05A117)
高等学校博士学科点专项科研基金博士生导师类项目(65010571)
天津市电力公司科技项目(KJ14-1-10)资助~~
关键词
排序模型融合
直接优化性能指标
排序学习
信息检索
rank aggregation
directly optimizing performance measure
learning to rank
information retrieval