摘要
研究是自然语言处理领域的关键问题之一,为使抽取的摘要更能体现多文档主题,本文在子主题划分的基础上,提出了一种融合句义特征的句子优化选择方法.该方法基于句义结构模型,提取句义结构中的话题、谓词等特征,并融合统计特征构造特征向量计算句子权重,最后采用综合加权选取法和最大边缘相关相结合的方法抽取摘要.选取不同主题的文本集进行实验和评价,在摘要压缩比为15%情况下,系统摘要平均准确率达到66.7%,平均召回率达到65.5%.实验结果表明句义特征的引入可以有效提升多文档摘要的效果.
Multi-document summarization (MDS) is one of the key issues in the field of natural language processing. In order to extract compendious sentences to reflect more accurate theme of the multi-document, a new method was proposed to retrieve terse sentences. At first, some sentential semantic features (SSF), for example topic and predicate, were extracted based on a sentential semantic model (SSM). Then the sentence weight was calculated by building feature vector merging statistical features and SSF. Finally, sentences were extracted according to the feature weighting and maximal marginal relevance (MMR). A set of experiment show that the new method is effective, the average precision rate of summary can reach 66. 7%, and the average recall rate can reach 65.5% when the compression ratio of summary is 15%. The results of experiments show that the SSF are effective on upgrading the affection of MDS.
作者
罗森林
白建敏
潘丽敏
韩磊
孟强
LUO Shen-lin BAI Jian-min PAN Li-min HAN Lei MENG Qiang(School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China)
出处
《北京理工大学学报》
EI
CAS
CSCD
北大核心
2016年第10期1059-1064,共6页
Transactions of Beijing Institute of Technology
基金
国家"二四二"资助项目(2005C48)
北京理工大学科技创新计划重大项目培育专项资助项目(2011CX01015)
关键词
多文档自动摘要
句义结构模型
句义特征
自然语言处理
multi-document summarization
sentential semantic model
sentential semantic feature
natural language processing