摘要
为了解决传统抽象式摘要模型生成的中文摘要难以保存原文本语义信息的问题,提出了一种融合语言特征的抽象式中文摘要模型。模型中添加了拼接层,将词性、命名实体、词汇位置、TF-IDF等特征拼接到词向量上,使输入模型的词向量包含更多的维度的语义信息来确定关键实体。结合指针机制有选择地复制原文中的关键词到摘要中,从而提高生成的摘要的语义相关性。使用LCSTS新闻数据集进行实验,取得了高于基线模型的ROUGE得分。分析表明本模型能够生成语义相关度较高的中文摘要。
In order to solve the problem that the Chinese summarization generated by traditional abstractive models can hardly preserve the semantic information of the original text,this paper proposed an abstractive Chinese summarization model with linguistic features.This model added a connection layer,and spliced features such as part of speech,named entity,word position,and TF-IDF into the word vector,so that the word vector of the input model contained more semantic information to determine the key entity.The pointer mechanism allowed model selectively copied the keywords in source text into the summarization to improve the semantic relevance between source text and summarization.This paper evaluated this model on LCSTS dataset,and obtained a higher ROUGE score than the baseline model.The analysis result shows that the model can generate Chinese summarization with higher semantic relevance.
作者
胡德敏
王荣荣
Hu Demin;Wang Rongrong(School of Optical-Electrical&Computer Engineering,University of Shanghai for Science&Technology,Shanghai 200093,China)
出处
《计算机应用研究》
CSCD
北大核心
2020年第2期351-354,369,共5页
Application Research of Computers
基金
国家自然科学基金资助项目(61170227,61472256)
上海市教委科研创新重点资助项目(12zz17)
上海市一流学科建设项目(S1201YLXK).
关键词
抽象式摘要模型
语言特征
关键实体
词向量
abstractive summarization model
linguistic features
key entities
word vector