期刊文献+

PWFT-BERT:一种融合排序学习与预训练模型的检索排序方法 被引量:2

PWFT-BERT:A Retrieval Ranking Method Integrating Learning to Rank and Pre-Trained Model
下载PDF
导出
摘要 信息检索是从文档集合或互联网中找出用户所需信息的过程,细化为召回和排序两个阶段。针对排序阶段中相关文档的重排序,提出一种称为融合排序学习与预训练模型的检索排序方法(Pair-Wise FineTuned Bidirectional Encoder Representation from Transformers,PWFTBERT)。通过对候选论文数据集使用BM25等算法召回出与查询相关的小范围文档后,可应用PWFT-BERT对召回得到的文档集合进行排序。为构造pair-wise形式的训练数据,提出一种伪负例生成算法生成训练数据,并使用排序学习方法微调预训练模型使其适配排序任务。对比IT-IDF和BM25基线方法,PWFT-BERT在WSDM-DiggSci 2020数据集上的检索结果提升了240%和74%,证明了所提方法的有效性。 Information Retrieval is the process of finding relevant information needed by users from Internet or large document collections,which includes two stages:recall and ranking.To address the re-ranking of related documents in the ranking stage,a retrieval ranking method called PWFT-BERT is proposed,which integrates Learning to Rank and pre-training models.First,by using recall algorithms such as BM25,the candidate paper dataset is recalled to a small range of documents related to query,and then PWFT-BERT is used to rank the recalled documents.To train PWFT-BERT,we construct pair-wise form training data by using a pseudo-negative example generation algorithm,and use Learning to Rank method to fine-tune the pre-trained model to fit the ranking task.Compared with the IT-IDF and BM25 baseline methods,the retrieval results of PWFT-BERT on the WSDMDiggSci 2020 dataset are improved by 240% and 74%,respectively,proving the effectiveness of the proposed method.
作者 苏珂 黄瑞阳 张建朋 胡楠 余诗媛 SU Ke;HUANG Ruiyang;ZHANG Jianpeng;HU Nan;YU Shiyuan(Zhengzhou University,Zhengzhou 450001,China;Information Engineering University,Zhengzhou 450001,China)
出处 《信息工程大学学报》 2022年第4期460-466,共7页 Journal of Information Engineering University
基金 国家自然基金青年基金资助项目(62002384) 中国博士后科学基金面上项目(2020M683760)。
关键词 自然语言处理 信息检索 排序学习 预训练模型 检索排序 natural language processing information retrieval learning to rank pre-trained models retrieval ranking
  • 相关文献

参考文献2

二级参考文献20

  • 1郑煜,钱榕.一个基于链接分析的相关度排序算法及其在专题搜索引擎中应用[J].计算机应用与软件,2007,24(7):54-55. 被引量:5
  • 2Page L. The PageRank Citation Ranking: Bring Order to the Web[EB/ OL]. Stanford Digital Libraries Working Paper. [1999]. http://www. diglib. stanford. edu.
  • 3Kleinberg J M. Authoritative sources in a hyperlinked environment[J]. Journal of the ACM, 1999,46(5) :604 - 632.
  • 4Nallapati R. Discriminative Modds for Information Retrieval[C]. Proceedings of the 27th SIGIR conference, on information retrieval,2004: 64-71.
  • 5Caruana R, Baluja S, Mitchell T. Using the future to "sort out" the present:~ and multitask learning for medical risk evaluation[J]. Advances in Neural information Processing Systems (NIPS)8: 959 - 965.
  • 6Freund Y, lyer R D, Schapire R E, et al. An efficient boosting algorithm for combining preferences[ C ]. Proceedings of the 15th Intl. conference on machine learning , San Francisco, CA, USA, 1998:170 - 178.
  • 7Joachims T. Optimizing search engines using clickthrongh data [ C ]. proceedings of the 8th ACM SIGKDD intl. conference, on knowledge discovery and data mining. New York, NY, USA, ACM press,2002: 133 - 142.
  • 8Herbrich R, Graepel T, Obermayer K. Large margin rank boundaries for ordinal regression[ J ]. Advances in Large Marge Classifiers. MIT Press,2000:115- 132.
  • 9Burges C, et al. Leaming to rank using gradient descent[C]. proceedings of the 22nd intl. conf. on machine learning, 2005:89-96.
  • 10T-S.Chua, S-Y.Neo, H-K.C, oh,et al.Trecvid 2005 by nus pris[J]. NIST TRECVID, Nov,2005.

共引文献5

同被引文献18

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部