摘要
为解决文本自动摘要任务中特征挖掘不充分的问题,选取句子的词汇、相对位置、长度和句间相似度4个特征,提出一种基于多特征融合模型的摘要系统。基于句法树的词汇特征充分利用语法信息,消除传统方法获取关键词的局限性,相对位置特征通过获取位置的高阶信息对句子进行赋值,长度特征过滤掉过长的句子,基于平滑逆向频率句嵌入方法构造句向量,有效计算句子间的相似度。实验结果表明,该系统提高了文本自动摘要的准确度。
To solve the problem of inadequate feature mining in automatic text summarization task,a summarization system based on multi-feature fusion model was proposed by selecting four features of sentence vocabulary,relative position,length and similarity between sentences.Among them,the lexical features based on syntactic tree made full use of the grammatical information and eliminated the limitation of the traditional method of obtaining keywords.The relative position feature assigned the sentence by obtaining the higher order information of the position.The length feature was used filter the rather long sentences.Based on the smoothing inverse frequency sentence embedding method,the sentence vector was constructed and the similarity between sentences was calculated effectively.Experimental results show that the system improves the accuracy of automatic text summarization.
作者
吴世鑫
黄德根
张云霞
WU Shi-xin;HUANG De-gen;ZHANG Yun-xia(College of Computer Science and Technology,Dalian University of Technology,Dalian 116000,China)
出处
《计算机工程与设计》
北大核心
2020年第3期650-655,共6页
Computer Engineering and Design
关键词
文本摘要
多特征融合
句法树
平滑逆向频率句嵌入
语义相似度
text summarization
multi-feature combination
syntactic tree
smooth inverse frequency(SIF)sentence embedding
semantic similarity