期刊文献+

基于MMR和WordNet的新闻文本摘要生成研究

Research on News Text Summarizations Generation Based on MMR and WordNet
下载PDF
导出
摘要 针对新闻文本摘要提取过程中,传统抽取式算法存在对文本内容概括不全面、摘要内容冗余、关键词提取时未考虑异词同义等问题,提出了一种基于最大边界相关算法(MMR)和词汇语义网(WordNet)的新闻文本摘要生成算法--WMMR.该算法综合考虑文本相似度、关键词、句子位置信息、线索词等特征对句子权重的影响,从而优化MMR算法中的句子得分,并在计算关键词得分时引入WordNet合并同义词.在NLPCC2017公开数据集上验证本文算法的有效性,结果表明WMMR算法的ROUGE值相较于TextRank算法提升4个百分点,相较于MMR算法提升7个百分点.在神策杯2018与SogouCS公开数据集上验证本文算法的普适性,结果表明WMMR算法的ROUGE值相较于传统TextRank,MMR等算法均有提升,证明WMMR算法有效提升了生成摘要的质量. In the process of extracting news text summarizations,traditional extraction algorithms have some problems,such as incomplete summarization of text content,redundancy of summary content and synonyms of different words are not considered in keyword extraction.An algorithm WMMR based on Maximal Marginal Relevance(MMR)and WordNet is proposed to generate news text summarizations.In order to optimize the sentence score in MMR algorithm,this algorithm comprehensively considers the influence of text similarity,keywords,sentence position information,clue words and other features on sentence weight.Among them,WordNet is introduced to merge synonyms when calculating the score of keywords.The effectiveness of the proposed algorithm is verified on NLPCC2017 public dataset.The results show that the ROUGE value of WMMR algorithm increases by 4 percentage points compared with TextRank algorithm and 7 percentage points compared with MMR algorithm.The universality of the proposed algorithm is verified on Shence Cup 2018 and SogouCS public datasets.The results show that the ROUGE value of the WMMR algorithm is improved compared with the traditional TextRank and MMR algorithms,which proves that the WMMR algorithm effectively improves the quality of generated summaries.
作者 张琪 范永胜 金独亮 ZHANG Qi;FAN Yongsheng;JIN Duliang(College of Computer and Information Science,Chongqing Normal University,Chongqing 401331,China)
出处 《西南师范大学学报(自然科学版)》 CAS 2023年第5期77-86,共10页 Journal of Southwest China Normal University(Natural Science Edition)
基金 重庆师范大学(人才引进/博士启动)基金项目(17XCB008) 教育部人文社会科学研究项目(18XJC880002) 重庆市教育委员会科技项目(KJQN201800539).
关键词 新闻文本摘要 抽取式算法 最大边界相关算法 词汇语义网 异词同义 news text summarization extraction algorithm maximal marginal relevance algorithm WordNet synonyms of different words
  • 相关文献

参考文献9

二级参考文献53

共引文献144

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部