摘要
提出了一种面向工程咨询报告的文本检索模型,通过联合语义匹配和关联匹配实现了标题与段落的准确、高效检索,可有效地辅助工程咨询报告的撰写工作。首先,基于工程咨询报告的文本检索语料集,对对比学习模型进行微调,并对标准的基于变换器的双向编码器(Vanilla BERT)模型进行初始化;接着,利用Vanilla BERT模型和线性层对语料文本进行训练,得到语义匹配分数。同时,构建了文本信息和关键词信息的义原词向量表示,并通过深度文本交互模型获得关联匹配分数。将语义匹配分数和关联匹配分数归一化后进行加权融合,得到最终的匹配分数,完成标题与段落之间的文本检索。在所提模型中结合了上下文向量表示和文本交互匹配方法,相较于最优的对比模型,所提模型的P@20评价指标提升了7.49%,有效增强了文本检索的效果。
A text retrieval model for engineering consulting reports is proposed,combining semantic and association matchings to achieve accurate and efficient retrieval of titles and paragraphs,and effectively assisting the writing of engineering consulting reports.Based on text retrieval corpus for engineering consulting reports,the comparative learning model is fine-tuned by the corpus set.Then the vanilla bidirectional encoder representations from transformers model(Vanilla BERT)is initialized,the textual data is then trained through the Vanilla BERT model and a linear layer to obtain semantic matching score.At the same time,we build vector representations of semantic primitives for textual and keyword information,and obtain the association matching score through the deep text interaction model.The obtained semantic matching score and association matching score are normalized and then weighted and fused to acquire the final matching score,and the text retrieval between the title and the paragraph is completed.Compared with the optimal comparative model,a combination of contextual vector representation and text interaction matching methods is incorporated,which improves the evaluation index of P@20 by 7.49%and effectively enhances the effects of text retrieval.
作者
张乐
杜一凡
吕学强
李业龙
夏雷
ZHANG Le;DU Yifan;LYU Xueqiang;LI Yelong;XIA Lei(Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University,Beijing 100101,China;Beijing Engineering Consulting Company,Beijing 100124,China)
出处
《北京邮电大学学报》
EI
CAS
CSCD
北大核心
2024年第2期123-129,共7页
Journal of Beijing University of Posts and Telecommunications
基金
国家自然科学基金项目(62171043)
国家语委重点项目(ZDI145-10)
北京市教育委员会科学研究计划项目(KM202311232001)。
关键词
文本检索
联合排序
词向量
字向量
义原
text retrieval
joint ranking
word vector
character vector
sememe