期刊文献+

融合主题特征的文本自动摘要方法研究 被引量:5

Research on automatic text summarization combining topic feature
下载PDF
导出
摘要 针对传统图模型方法进行文本摘要时只考虑统计特征或浅层次语义特征,缺乏对深层次主题语义特征的挖掘与利用,提出了融合主题特征后多维度度量的文本自动摘要方法MDSR(multi-dimension summarization rank)。首先利用LDA主题模型对文本主题语义信息进行挖掘,定义了主题重要度以衡量主题特征对句子重要程度的影响;然后结合主题特征、统计特征和句间相似度,改进了图模型节点的概率转移矩阵的构建方式;最后根据句子节点权重进行摘要的抽取与度量。实验结果显示,当主题特征、统计特征及句间相似度权重比例达到3:4:3时,MDSR方法的ROUGE评测值达到最佳,ROUGE-1、ROUGE-2、ROUGE-SU4值分别达到53.35%、35.18%和33.86%,优于对比方法,表明了融入主题特征后的文本摘要方法有效提高了摘要抽取的准确性。 Aiming at the traditional graph models for text summarization only focus on statistical features or shallow semantic features,and lack mining and utilization of deep topic semantic features,this paper proposed MDSR(multi-dimension summarization rank),an automatic text summarization method that combined topic feature.Specifically,this method adopted the LDA model to mine the semantic information of text topics and measured the impact of topic feature on a sentence by defining the importance of the topic.And it improved the construction mode of the probability transition matrix of graph model nodes by combining the topic feature with statistic features and inter-sentence similarity.Finally,it extracted and measured summarization according to the weight of sentence nodes.The results show that the ROUGE value evaluates by MDSR reaches the best when the weight ratio of topic feature,statistic feature and inter-sentence similarity is 3:4:3.The ROUGE-1,ROUGE-2,ROUGE-SU4 are 53.35%,35.18%and 33.86%,which perform better than other comparisons.It shows that the text summarization method combining topic feature can effectively improve the accuracy of the summarization extraction.
作者 罗芳 汪竞航 何道森 蒲秋梅 Luo Fang;Wang Jinghang;He Daosen;Pu Qiumei(School of Computer Science&Technology,Wuhan University of Technology,Wuhan 430063,China;Dept.of Supply Chain&Information Management,Hang Seng University of Hong Kong,Hong Kong 999077,China;School of Information Engineering,Minzu University of China,Beijing 100081,China)
出处 《计算机应用研究》 CSCD 北大核心 2021年第1期129-133,共5页 Application Research of Computers
基金 国家教育部人文社会科学研究规划基金资助项目(18YJAZH087) 武汉理工大学自主创新研究基金资助项目(3120600100)。
关键词 TextRank 文本摘要 语义特征 主题模型 概率转移矩阵 TextRank text summarization semantic features LDA probability transition matrix
  • 相关文献

参考文献9

二级参考文献110

共引文献190

同被引文献69

引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部