摘要
基于图模型的TextRank方法形成的摘要不会脱离文档本身,但在抽取文本特征的时候,传统的词向量获取方法存在一词多义的问题,而基于BERT的词向量获取方式,充分挖掘了文本语义信息,缓解了一词多义问题。对不同词嵌入方法进行了实验对比,验证了BERT模型的有效性。基于词频统计的相似度计算方法也忽略了句子的语义信息,文中选择了向量形式的相似度的计算方法用于文本摘要生成。最后在TTNews数据集上做实验,效果有了明显的提升。
The abstract formed by TextRank method based on graph model will not be separated from the document itself,but when extracting text features,the traditional word vector acquisition method has the problem of polysemy,while the word vector acquisition method based on BERT fully excavates the semantic information of the text and alleviates the problem of polysemy.The experimental comparison of different word embedding methods verifies the effectiveness of the BERT model.The similarity calculation method based on word frequency statistics also ignores the semantic information of sentences.In this paper,the similarity calculation method in vector form is selected for text abstract generation.Finally,the experiment on TTNews data set shows that the effect is obviously improved.
作者
黄菲菲
HUANG Feifei(Henan University of Economics and Law,Zhengzhou 450046,China)
出处
《现代信息科技》
2022年第2期91-95,100,共6页
Modern Information Technology
基金
青年科学基金项目(61806073)
河南省科技攻关项目(222102210339)。