摘要
为了提高自动摘要的质量,研究了基于图模型的词句协同排序的自动摘要算法技术.自动摘要试图从原始文本中提取一定数量的重要句子形成节录式摘要,句排序是实现自动摘要的典型手段,已有工作大多通过构建词或句关联网络,再使用PageRank计算节点排序分值,该算法考虑词与句之间的互影响,提出词句协同排序的自动摘要算法,在句关联网络上融入词对句子排序分值的影响,而词的重要性由包含其句子的排序分值所决定.在句排序结果基础上,提出基于冗余度的句选择方法,以进一步提升自动摘要质量.在10篇中文文档上的试验结果表明,较之于单纯的句排序方法,所提出方法能有效提升自动摘要的准确率和召回率.
To improve the quality of automatic summarization,the graph-based word-sentence co-ranking method was investigated. Automatic summarization aims to extract a number of sentences from original document to form the so-called extractive summary. Sentence ranking is one of the typical techniques for automatic summarization. Existing studies commonly construct a network of words or sentences and employ PageRank to obtain the ranking scores. Based on the interaction between word and sentence,a novel word-sentence co-ranking method was proposed for automatic summarization. The importance of words was incorporated into the ranking process on the network of sentences,while the weight of every word was determined by all sentences containing the word. Based on redundancy,a sentence selection method was presented for further improving the quality of auto-generated summary. The experimental results on ten Chinese documents demonstrate that compared with the ranking method only on sentence network,the proposed method can significantly improve the precision and recall rate of automatic summaries.
出处
《江苏大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2016年第4期443-449,共7页
Journal of Jiangsu University:Natural Science Edition
基金
国家科技支撑计划项目(2013BAH16F01)
国家电子商务信息处理联合研究中心项目(2013B01035)
江苏省高校自然科学基金资助项目(15KJB520012)
关键词
自动摘要
单文本文摘
节录式摘要
词句协同排序
PAGERANK
automatic summarization
summary of single-document
extractive summary
word-sentence co-ranking
PageRank