期刊文献+

智能检索中基于生成式模型和伪相关反馈的查询扩展方法

Query Expansion Method Based on Generative Model and Pseudo Relevance Feedback in Intelligent Retrieval
原文传递
导出
摘要 [目的/意义]为改善检索系统中伪相关反馈对初检文档集过度依赖和生成式模型未考虑相关文档中潜在扩展项等问题,提出一种基于生成式模型和伪相关反馈的查询扩展模型。[方法/过程]综合生成式模型和伪相关反馈两种方法的优势,分别采用查询生成模型和伪相关反馈生成候选扩展词集,将两种候选扩展词集合并得到最终扩展词集,实现查询扩展。最后,以NQ和TriviaQA两个标准开放域问答数据集为实验语料,基于密集检索验证所提出查询扩展方法的有效性。[结果/结论]实验结果表明,所提出模型检索结果Top-k的检索准确率和EM均高于基准方法;另外,测试伪相关反馈查询词数量、生成式模型上下文类别以及问题类别对模型性能的影响,实验结果验证了所提出方法的有效性。所提出方法能够提高查询扩展词质量,改善信息检索性能。 [Purpose/Significance]To address the issues of over-reliance on the original retrieved document collection in pseudo-relevance feedback and the neglect of potential expansion elements in relevant documents by generation-augmented retrieval model in retrieval systems,this paper proposes a query expansion model based on generative model and pseudo-relevance feedback.[Method/Process]According to the advantages of both generative models and pseudo relevance feedback,it generated candidate extended word sets using query generative models and pseudo-relevance feedback,respectively.Then,it combined the two sets to obtain the final extended word set,achieving query expansion.Finally,taking NQ and TriviaQA as experimental data,it confirmed the efficiency of the proposed query expansion model using dense passage retrieval.[Result/Conclusion]The experimental results demonstrates that the Top-k retrieval accuracy and EM of the proposed model is higher than the baseline ones.In addition,the effects of the number of pseudo-relevance feedback query words,the context category of the generative model,and the question category on the model performance are tested,and the experimental results verify the effectiveness of the proposed method.The proposed model can improve the quality of query expansion words and information retrieval performance.
作者 秦春秀 吕树月 王玉龙 马续补 李凡 Qin Chunxiu;Lv Shuyue;Wang Yulong;Ma Xubu;Li Fan(School of Economics and Management,XIDIAN University,Xi’an 710071;Shaanxi Information Resources Research Center,Xi’an 710071)
出处 《图书情报工作》 北大核心 2024年第15期117-127,共11页 Library and Information Service
基金 国家社会科学基金重点项目“场景驱动的我国关键核心领域文献资源精细组织与精准服务模式研究”(项目编号:22ATQ002)研究成果之一。
关键词 查询扩展 文本生成 伪相关反馈 信息检索 query expansion text generation pseudo relevance feedback IR
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部