Because of users' growing utilization of unclear and imprecise keywords when characterizing their informa- tion need, it has become necessary to expand their original search queries with additional words that best ca...Because of users' growing utilization of unclear and imprecise keywords when characterizing their informa- tion need, it has become necessary to expand their original search queries with additional words that best capture their actual intent. The selection of the terms that are suitable for use as additional words is in general dependent on the degree of relatedness between each candidate expansion term and the query keywords. In this paper, we propose two criteria for evaluating the degree of relatedness between a candidate expansion word and the query keywords: (1) co-occurrence frequency, where more importance is attributed to terms oc- curring in the largest possible number of documents where the query keywords appear; (2) proximity, where more im- portance is assigned to terms having a short distance from the query terms within documents. We also employ the strength Pareto fitness assignment in order to satisfy both criteria si- multaneously. The results of our numerical experiments on MEDLINE, the online medical information database, show that the proposed approach significantly enhances the re- trieval performance as compared to the baseline.展开更多
用图书的出版信息和用户生成的社会信息从社会媒体中搜索出相关的图书已成为信息检索系统的一个研究热点。大部分的信息检索系统都是由单一的检索方法构成,随着用户需求的不断增加,这些系统难以满足用户需求。针对上述问题,提出了一种...用图书的出版信息和用户生成的社会信息从社会媒体中搜索出相关的图书已成为信息检索系统的一个研究热点。大部分的信息检索系统都是由单一的检索方法构成,随着用户需求的不断增加,这些系统难以满足用户需求。针对上述问题,提出了一种基于重排序融合的图书检索系统。使用伪相关反馈技术对用户查询内容进行扩展,并将检索结果作为初排序结果;再使用用户生成的社会信息特征对初排序结果进行重排序,最后采用排序学习模型对多种重排序策略得到的结果进行融合。在INEX 2012—2014 Social Book Search公开数据集上针对其他先进检索系统进行了对比实验,实验结果表明,系统的性能(NDCG@10)优于其他方法构成的图书检索系统。展开更多
文摘Because of users' growing utilization of unclear and imprecise keywords when characterizing their informa- tion need, it has become necessary to expand their original search queries with additional words that best capture their actual intent. The selection of the terms that are suitable for use as additional words is in general dependent on the degree of relatedness between each candidate expansion term and the query keywords. In this paper, we propose two criteria for evaluating the degree of relatedness between a candidate expansion word and the query keywords: (1) co-occurrence frequency, where more importance is attributed to terms oc- curring in the largest possible number of documents where the query keywords appear; (2) proximity, where more im- portance is assigned to terms having a short distance from the query terms within documents. We also employ the strength Pareto fitness assignment in order to satisfy both criteria si- multaneously. The results of our numerical experiments on MEDLINE, the online medical information database, show that the proposed approach significantly enhances the re- trieval performance as compared to the baseline.
文摘用图书的出版信息和用户生成的社会信息从社会媒体中搜索出相关的图书已成为信息检索系统的一个研究热点。大部分的信息检索系统都是由单一的检索方法构成,随着用户需求的不断增加,这些系统难以满足用户需求。针对上述问题,提出了一种基于重排序融合的图书检索系统。使用伪相关反馈技术对用户查询内容进行扩展,并将检索结果作为初排序结果;再使用用户生成的社会信息特征对初排序结果进行重排序,最后采用排序学习模型对多种重排序策略得到的结果进行融合。在INEX 2012—2014 Social Book Search公开数据集上针对其他先进检索系统进行了对比实验,实验结果表明,系统的性能(NDCG@10)优于其他方法构成的图书检索系统。