期刊文献+

基于领域概念图的航天新闻自动摘要模型

Automatic summarization model of aerospace news based on domain concept graph
下载PDF
导出
摘要 互联网海量的航天新闻中隐含着大量航天情报信息,对其进行理解与压缩是提高后续情报分析效率的基础。然而通用的自动摘要算法往往会忽略很多航天领域关键信息,且有监督自动摘要算法需要对领域文本进行大量的数据标注,费时费力。因此,提出一种基于领域概念图的无监督自动摘要(DCG-TextRank)模型,利用领域术语辅助引导图排序,提高模型对领域文本的理解力。该模型分3个模块:领域概念图生成、图权重初始化、图排序及语义筛选。根据句向量相似度和领域术语库,将文本转换为包含句子节点和领域术语节点的领域概念图;根据航天新闻文本特征初始化领域概念图权值;采用TextRank模型对句子进行排序,并在语义筛选模块通过图节点聚类及设置摘要语义保留度的方法改进TextRank的输出,充分保留文本的多语义信息并降低冗余。所提模型具有领域可移植性,且实验结果表明:在航天新闻数据集中,所提模型相比传统TextRank模型性能提升了14.97%,相比有监督抽取式文本摘要模型BertSum和MatchSum性能提升了4.37%~12.97%。 The effectiveness of subsequent intelligence analysis can be increased by comprehending and compressing the vast amount of aerospace information that is hidden in the Internet's aerospace news.However the general automatic summarization algorithms tend to ignore many domain key Information,and the existing supervised automatic summarization algorithms need to annotate a lot of data in the domain text.It is time-consuming and laborious.Therefore,we proposed an unsupervised automatic summarization model TextRank based on domain concept graph(DCG-TextRank).It is based on a domain concept graph,which uses domain terms to help guide graph ordering and improve the model's understanding of domain text.The model has three modules:domain concept graph generation,graph weight initialization,graph sorting and semantic filtering.Transform the text into domain concept graph containing sentence nodes and domain term nodes according to sentence vector similarity and domain term database.Initialize the domain concept graph weight according to the features of aerospace news text.Use the TextRank algorithm to sort the sentences,and in the semantic filtering module,the output of TextRank is improved by clustering the graph nodes and setting the semantic retention of the abstract,which fully preserves the semantic Information of text and reduces redundancy.The proposed model is domain portable,and experimental findings indicate that in the aerospace news dataset,the proposed model performs 14.97%better than the conventional TextRank model and 4.37%~12.97%better than the supervised extraction text summary models BertSum and MatchSum.
作者 黄浩宁 陈志敏 徐聪 张晓燕 HUANG Haoning;CHEN Zhimin;XU Cong;ZHANG Xiaoyan(National Space Science Center,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China;State Radio Monitoring Center,Beijing 100037,China)
出处 《北京航空航天大学学报》 EI CAS CSCD 北大核心 2024年第1期317-327,共11页 Journal of Beijing University of Aeronautics and Astronautics
基金 国家自然科学基金(91738101) 国家重点研发计划(2020YFB1807900)。
关键词 自动文本摘要 领域概念图 预训练语言模型 图排序算法 图节点聚类 automatic text summarization domain concept graph Pre-trained language model graph sorting algorithm graph node clustering
  • 相关文献

参考文献9

二级参考文献49

共引文献109

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部