摘要
是获取微博重要信息的一种手段,但微博的短文本、高冗余、高噪声等特征对于自动摘要造成较大的影响。为此,提出一种基于个人微博内容与相关性的事件摘要提取算法CR-PageRank。将微博事件集构建成事件图,结合微博内容质量,利用CR-PageRank算法计算出微博的总权重,选取有代表性的微博生成初始摘要进行可读性加工,使摘要更具可读性。实验结果表明,该算法相对于TextRank算法和LexRank算法,准确率和召回率明显提高,而且生成的摘要内容简洁,信息全面,阅读性好。
Automatic document summarization is an approach to obtain important information of microblog,but with the characteristics of short text,high redundancy and high noise of microblog,cause great difficulties for automatic summary. For this problem,an event summary extraction algorithm based on the content and relativity of individual micro blog is presented,called Content and Relativity PageRank(CR-PageRank).It uses a set of events of microblog to build an event graph.And combines with content quality of microblog and calculates the total weight of microblog by using CR-PageRank algorithm,extracts representative microblog to generate the initial summary.It processes the readability to make the final summary more readable.Experimental results show that by comparing with TextRank algorithm and LexRank algorithm,it is precise and recall rate is increased significantly,and the generated content is more concise,more comprehensive information,and better readability.
出处
《计算机工程》
CAS
CSCD
北大核心
2016年第11期64-69,共6页
Computer Engineering
基金
国家自然科学基金(61163025)
内蒙古自然科学基金(2015MS0621)