摘要
关键短语抽取任务是文本知识抽取任务的基础性工作,存在关键短语抽取边界不清晰、抽取结果重复率较高等问题,导致抽取结果准确性不佳。本文针对关键短语出现在文章中的位置特征建模,基于Transfomer编码器-解码器结构,结合位置特征与预训练模型对关键短语进行预测,提出一种端到端的关键短语预测模型;在模型训练过程中,采用了基于匈牙利算法对预测值与真实值进行序列对应的交叉熵损失函数,使关键短语预测过程,排除序列生成方法中预定排序的影响,并以集合的方式抽取关键短语。分别在Inspec、SemEval2017、KP20k数据集进行了实验验证,与现有方法相比较,本文模型F1值均有所提升,有助于提升文本信息的关键短语抽取效果。
Key phrase extraction is a fundamental task in text knowledge mining,but the current task still suffers from unclear boundaries of key phrase extraction and high repetition rate of extraction results,resulting in poor accuracy of extraction results.An end-to-end key phrase prediction model based on Transfomer encoder-decoder structure backbone is proposed,which combines location feature and pre-trained model to predict key phrase.A cross-entropy loss function using Hungarian algorithm for permutation between predictions and ground truth is applied in training process to enable the key phrase prediction process to exclude the effect of predetermined ordering in sequence generation methods and to extract key phrases as a set.The model is validated on Inspec,SemEval2017,and KP20k datasets respectively.The F1-scores of the model are all improved compared with existing methods,which helped to improve the key phrase extraction of textual information.
作者
于子健
孙海春
李欣
YU Zijian;SUN Haichun;LI Xin(College of Information Network Security,People's Public Security University of China,Beijing 100038,China)
出处
《智能计算机与应用》
2023年第2期20-28,共9页
Intelligent Computer and Applications
基金
国家重点研发计划(2020AAA0107700)
国家自然科学基金(62076246)
公安部技术研究计划项目(2020JSYJC22)。