期刊文献+

结合主题信息聚类编码的文本摘要模型 被引量:2

A Text Abstract Summarization Model Combined with Theme Information Clustering Coding
下载PDF
导出
摘要 结合注意力机制的序列到序列模型在生成式文本摘要的研究中已取得了广泛应用,但基于该模型的摘要生成技术依然存在信息编码不充分、生成的摘要偏离主题的问题,对此提出了一种结合主题信息聚类编码的文本摘要生成模型TICTS(theme information clustering coding text summarization)。将传统的抽取式文本摘要方法与基于深度学习的生成式文本摘要方法相结合,使用基于词向量的聚类算法进行主题信息提取,利用余弦相似度计算输入文本与所提取关键信息的主题相关性,将其作为主题编码的权重以修正注意力机制,在序列到序列模型的基础上结合主题信息与注意力机制生成摘要。模型在LCSTS数据集上进行实验,以ROUGE为评价标准,实验结果相对于基线模型在ROUGE-1的得分上提高了1.1,ROUGE-2提高了1.3,ROUGE-L提高了1.1。实验证明结合主题信息聚类编码的摘要模型生成的摘要更切合主题,摘要质量有所提高。 The sequence-to-sequence model combined with the attention mechanism has been widely used in the research of the generative text abstract,but the abstract generation technology based on this model still has the problems of insufficient information encoding and the generated abstract deviating from the topic.Therefore,we present a TICTS(theme information clustering coding text summarization)model based on the cluster encoding of topic information.The traditional extraction text abstract method is combined with the generation text summary method based on deep learning,and the topic information is extracted by using the clustering algorithm based on word vector.The topic correlation between the input text and the extracted key information is calculated by cosine similarity,which is used as the weight of topic encoding to modify the attention mechanism,and the abstract is generated by combining the topic information and attention mechanism on the basis of the sequence-to-sequence model.The model is tested on the LCSTS dataset.With ROUGE as the evaluation standard,compared with the baseline model,the experimental results are improved by 1.1,1.3 and 1.1 in terms of the score of Rouges-1,Rouges-2 and Rouges-L.It is showed that the summary model combined with the abstract model of topic information cluster encoding is more relevant to the topic,and the quality of abstract is improved.
作者 魏媛媛 倪建成 高峰 吴俊清 WEI Yuan-yuan;NI Jian-cheng;GAO Feng;WU Jun-qing(School of Software,Qufu Normal University,Jining 272000,China)
出处 《计算机技术与发展》 2021年第1期30-34,共5页 Computer Technology and Development
基金 国家自然科学基金青年项目(61601261) 山东省研究生教育质量提升计划项目(SDYY17136)
关键词 序列到序列模型 生成式文本摘要 词向量聚类 主题编码 余弦相似度 sequence-to-sequence model generative text abstract word vector clustering theme coding cosine similarity
  • 相关文献

参考文献9

二级参考文献53

  • 1卢新国,林亚平,陈治平.一种改进的互信息特征选取预处理算法[J].湖南大学学报(自然科学版),2005,32(1):104-107. 被引量:12
  • 2耿焕同,蔡庆生,赵鹏,于琨.一种基于词共现图的文档自动摘要研究[J].情报学报,2005,24(6):651-656. 被引量:15
  • 3苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:386
  • 4Anil K J. Data clustering:50 years beyond K-Means[J].Pattern Recognition Letters,2010,(08):651-666.
  • 5Likas A,Vlassis M,Verbeek J. The global K-means clustering algorithm[J].Pattern Recognition,2003,(02):451-461.doi:10.1016/S0031-3203(02)00060-2.
  • 6Selim S Z,Al-Sultan K S. Analysis of global K-means,an incremental heuristic for minimum sum-of-squares clustering[J].Journal of Classification,2005,(22):287-310.
  • 7Bellman R,Dreyfus S. Applied dynamic programming[M].Princeton,New Jersey:Princeton University Press,1962.
  • 8Aloise D,Deshpande A,Hansen P. NP-hardness of euclidean sum-of-squares clustering[J].Machine Learning,2009,(02):245-248.
  • 9Mahajan M,Nimbor P,Varadarajan K. The planar K-means problem is NP-hard[J].Lecture Notes in Computer Science,2009,(5431):274-285.
  • 10Ball G,Hall D. ISODATA,a novel method of data analysis and pattern classification[Technical rept. NTIS AD 699616. ][M].California:Stanford Research Institute,1965.

共引文献418

同被引文献7

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部