期刊文献+

基于高质量信息提取的微博自动摘要 被引量:7

Automatic Summarization of Microblog Based on High Quality Information Extraction
下载PDF
导出
摘要 是获取微博平台关键信息的一种重要手段。现有面向微博的自动摘要方法较关注文本集合中句子或者关键词的提取,而在去除冗余信息、内容噪声方面缺乏有效手段,导致提取的微博内容质量不高。为解决该问题,以微博平台为研究对象,提出一种基于时频域转换的信息提取方法,获得与某话题相关度高、冗余度低且信息量大的高质量微博文本,将综合分值较高的微博作为生成摘要的样本集合,并对该样本集合中每条微博的句子进行权重打分,选取权值较高的句子组成微博摘要。实验结果表明,该方法能够有效过滤冗余信息和内容噪声,基于自动评测和人工评测的摘要结果均优于现有自动摘要方法。 Automatic document summarization is an important approach to obtain key information of microblog platform. Most existing methods on microblogs automatic summarization pay more attention to extract sentences or key phrases from the set of documents, but there are few effective and commonly used methods on reducing the redundancy and noise, which results in the poor content quality of the extracted microblog messages and directly affects the performance of summary. This paper takes microblog platform as research object, proposes an information extraction method based on time-frequency transformation, and extracts a series of high quality microblogs which are highly related to one topic and with less redundancy and abundant informativeness. The sentences in the set of high quality microblogs are scored based on the weights of sentence characters, and the summary of microblogs is generated by ranking and selection of the sentences. Experimental results show that the method is effectively in filtering the redundancy and noise of microblogs,and the final summarization results based on automatic evaluation and manual evaluation outperform other automatic summarization methods' results.
出处 《计算机工程》 CAS CSCD 北大核心 2015年第7期36-42,共7页 Computer Engineering
基金 国家自然科学基金资助项目(61070083) 2013年深圳知识创新计划基金资助项目
关键词 微博自动摘要 冗余去除 信息提取 自动评测 人工评测 microblog automatic summarization redundancy removal information extraction automatic evaluation manual evaluation
  • 相关文献

参考文献18

  • 1周文林.新浪微博用户数超5亿[EB/OL].(2013-02-21).http://new s.xinhuanet.com/new media/2013-02/21/c_124369896.htm.
  • 2Sharifi B,Huttion M,Kalita J.Summarizing Microblogs Automatically[C]//Proceedings of 2010 Annual Conference of the North American Chapter of the ACL.Los Angeles,USA:Association for Computational Linguistics,2010:685-688.
  • 3Sharifi B,Inouye D,Kalita J.Summarization of Twitter Microblogs[J].The Computer Journal,2014,57(3):378-402.
  • 4Sharifi B,Huttion M,Kalita J.Experiments in Microblog Summarization[C]//Proceedings of the 2nd International Conference on Social Computing.M inneapolis,USA:IEEE Press,2010:49-56.
  • 5Chua F,Tat C,Asur S.Automation Summarization of Events from Social Media[C]//Proceedings of 2010International AAAI Conference on Weblogs and Social Media.Washington D.C.,USA:ACM Press,2010:291-300.
  • 6Wang Peng,Wang Haixun,Liu Majin,et al.An Algorithmic Approach to Event Summarization[C]//Proceedings of the 2010 ACM Conference on Management of Data.Indianapolis,USA:ACM Press,2010:183-194.
  • 7Ganesan K,Zhai C,Viegas E.An Unsupervised Approach to Generating Ultra-concise Summaries of Opinions[C]//Proceedings of the 21th International Conference on World Wide Web.New York,USA:ACM Press,2012:869-878.
  • 8Peng M,Huang J,Fu H,et al.High Quality Microblog Extraction Based on Multiple Feature and Timefrequency Transformation[M].Berlin,Germany:Springer,2013.
  • 9Weng Jianshu,Lim P,Jiang Jing,et al.Twitterrank:Finding Topic-sensitive Influential Tw itterers[C]//Proceedings of the 3rd ACM International Conference on Web Search and Data Mining.New York,USA:ACM Press,2010:261-270.
  • 10Hannah M,Geetha T,Mukherjee S.Automatic Extractive Text Summarization Based on Fuzzy Logic:A Sentence Oriented Approach[M].Berlin,Germany:Springer,2011.

同被引文献77

引证文献7

二级引证文献33

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部