期刊文献+

基于异构图和关键词的抽取式文本摘要模型

Extractive Document Summarization Model Based on Heterogeneous Graph and Keywords
下载PDF
导出
摘要 抽取式文本摘要使用一定的策略从冗长的文本中选择一些句子组成摘要,其关键在于要尽可能多地利用文本的语义信息和结构信息。为了更好地挖掘这些信息,进而利用它们指导摘要的抽取,提出了一种基于异构图和关键词的抽取式文本摘要模型(HGKSum)。该模型首先将文本建模为由句子节点和词语节点构成的异构图,在异构图上使用图注意力网络学习节点的特征,之后将关键词抽取任务作为文本摘要任务的辅助任务,使用多任务学习的方式进行训练,得到候选摘要,最后对候选摘要进行精炼以降低冗余度,得到最终摘要。在基准数据集上的对比实验表明,该模型性能优于基准模型,此外,消融实验也证明了引入异构节点和关键词的必要性。 Extractive document summarization uses certain strategies to select some sentences from lengthy texts to form a summary,whose key is to use as much semantic and structural information of the text as possible.In order to better mine such information and then use it to guide the summarization,an extractive document summarization model based on heterogeneous graph and keywords(HGKSum)is proposed,which models the text as a heterogeneous graph composed of sentence nodes and word nodes.The model uses the graph attention networks to learn the features of the nodes in the graph.The multi-task learning is applied to the model,which considers the keywords extraction task as an auxiliary task of the document summarization task.The candidate summary which derived from the prediction of the neural networks in the model is often highly redundant,so the model refines it to create the final summary of low redundancy.The comparative experiment on the document summarization benchmark shows that the proposed model outperforms the baselines.Besides,ablation studies also demonstrate the necessity of introducing heterogeneous nodes and keywords.
作者 朱颀林 王羽 徐建 ZHU Qilin;WANG Yu;XU Jian(School of Computer Science and Engineering,Nanjing University of Science and Technology,Nanjing 210094,China;Science and Technology on Information Systems Engineering Laboratory,National University of Defense Technology,Changsha 410003,China;The 28th Research Institute of China Electronics Technology Group Corporation,Nanjing 210007,China)
出处 《电子科技大学学报》 EI CAS CSCD 北大核心 2024年第2期259-270,共12页 Journal of University of Electronic Science and Technology of China
基金 国家自然科学基金(61872186) 国防基础科研计划国防科技重点实验室稳定支持项目(WDZC20225250405)。
关键词 抽取式文本摘要 异构图 关键词 图注意力网络 多任务学习 extractive document summarization heterogeneous graph keywords graph attention network multi-task learning
  • 相关文献

参考文献2

二级参考文献5

共引文献47

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部