摘要
结合注意力机制的序列到序列模型在生成式文本摘要的研究中已取得了广泛应用,但基于该模型的摘要生成技术依然存在信息编码不充分、生成的摘要偏离主题的问题,对此提出了一种结合主题信息聚类编码的文本摘要生成模型TICTS(theme information clustering coding text summarization)。将传统的抽取式文本摘要方法与基于深度学习的生成式文本摘要方法相结合,使用基于词向量的聚类算法进行主题信息提取,利用余弦相似度计算输入文本与所提取关键信息的主题相关性,将其作为主题编码的权重以修正注意力机制,在序列到序列模型的基础上结合主题信息与注意力机制生成摘要。模型在LCSTS数据集上进行实验,以ROUGE为评价标准,实验结果相对于基线模型在ROUGE-1的得分上提高了1.1,ROUGE-2提高了1.3,ROUGE-L提高了1.1。实验证明结合主题信息聚类编码的摘要模型生成的摘要更切合主题,摘要质量有所提高。
The sequence-to-sequence model combined with the attention mechanism has been widely used in the research of the generative text abstract,but the abstract generation technology based on this model still has the problems of insufficient information encoding and the generated abstract deviating from the topic.Therefore,we present a TICTS(theme information clustering coding text summarization)model based on the cluster encoding of topic information.The traditional extraction text abstract method is combined with the generation text summary method based on deep learning,and the topic information is extracted by using the clustering algorithm based on word vector.The topic correlation between the input text and the extracted key information is calculated by cosine similarity,which is used as the weight of topic encoding to modify the attention mechanism,and the abstract is generated by combining the topic information and attention mechanism on the basis of the sequence-to-sequence model.The model is tested on the LCSTS dataset.With ROUGE as the evaluation standard,compared with the baseline model,the experimental results are improved by 1.1,1.3 and 1.1 in terms of the score of Rouges-1,Rouges-2 and Rouges-L.It is showed that the summary model combined with the abstract model of topic information cluster encoding is more relevant to the topic,and the quality of abstract is improved.
作者
魏媛媛
倪建成
高峰
吴俊清
WEI Yuan-yuan;NI Jian-cheng;GAO Feng;WU Jun-qing(School of Software,Qufu Normal University,Jining 272000,China)
出处
《计算机技术与发展》
2021年第1期30-34,共5页
Computer Technology and Development
基金
国家自然科学基金青年项目(61601261)
山东省研究生教育质量提升计划项目(SDYY17136)
关键词
序列到序列模型
生成式文本摘要
词向量聚类
主题编码
余弦相似度
sequence-to-sequence model
generative text abstract
word vector clustering
theme coding
cosine similarity