摘要
当前大部分的抽取式摘要方法主要关注对摘要句的表示和抽取,容易忽略对文本特征表示的充分性。为了解决这一问题,提出一种基于度量学习和层级推理网络的抽取式摘要方法。首先,在抽取式任务基础上提出基于度量学习和层级推理的抽取式摘要模型(MLHIN);其次,在CNN/DailyMail数据集上进行模型评估,并在英文摘要数据集CNN/DailyMail上进行测试;最后,对测试结果进行验证。结果显示,所提方法模型在Rouge-1,Rouge-2,Rouge-L上的得分明显优于其他模型,比Lead-3模型分别高出0.84%,1.29%和2.43%;通过将提出的度量损失metric和层级推理模型中的句子编码器替换掉,可以看出模型性能均有不同程度的下降,证明了提出的层级推理网络和度量损失的有效性。新算法能够提高模型捕捉长距离依赖的能力,增强模型对摘要句与非摘要句的分辨力,有效改善了抽取式摘要方法的性能。
Most of the current extractive summarization methods mainly focus on the representation and extraction of summary sentences, and tend to ignore the adequacy of text feature representation.In order to solve this problem, an extractive summarization method was proposed.Firstly, on the basis of abstract tasks, an extractive summarization model(MLHIN) based on metric learning and hierarchical inference was proposed.Secondly, the model was evaluated and tested on the English CNN/DailyMail dataset.Finally, the test results of the model on the dataset are verified.The results show that the proposed model has significantly higher scores than other models on Rouge-1,Rouge-2 and Rouge-L,which are 0.84%,1.29% and 2.43% higher than the Lead-3 model respectively.After replacing the metric loss metric and the sentence encoder with other modules, it can be seen that the performance of the model has declined to varying degrees, which proves the effectiveness of the proposed hierarchical inference network and metric loss.The algorithm can improve the ability of model to capture long-distance dependency, enhance the ability of model to distinguish summary sentences from non-summary sentences, and effectively improve the performance of the extractive summarization methods.
作者
成悦
赵康
勾智楠
高凯
CHENG Yue;ZHAO Kang;GOU Zhinan;GAO Kai(School of Information Science and Engineering,Hebei University of Science and Technology,Shijiazhuang,Hebei 050018,China;School of Information Technology,Hebei University of Economics and Business,Shijiazhuang,Hebei 050061,China)
出处
《河北科技大学学报》
CAS
北大核心
2022年第6期594-601,共8页
Journal of Hebei University of Science and Technology
基金
河北省自然科学基金(F2022208006)
河北省高等学校科学技术研究项目(QN2020198)。
关键词
自然语言处理
句子编码器
文档编码器
度量学习
层级推理
抽取式文本摘要
natural language processing
sentence encoder
document encoder
metric learning
hierarchical inference
extractive text summarization