
官方微博关键词提取与摘要技术研究 被引量:1

Keywords extraction and event summarization technology research on official microblog
摘要 官方微博中混杂有较多无关其组织团体的信息,这为事件的提取与摘要工作带来了很大挑战.论文综合考虑官方微博数据的特性,提出了语料加权、标签识别的官方微博事件摘要模型,并结合官微相关语料提出了一种语料加权排序的关键词计算方法(Corpus Weighted Ranking,CWR),为博文相似度计算和事件摘要提供了基础支撑.实验测试表明,与IF-IDF和TextRank方法相比较,CWR在关键词提取正确率P,召回率R和F值表现更好,并在后期选取权重较大句子构成事件摘要时取得了很好的效果. Official Microblog is the certified Microblog, whose account generally belongs to an organization. Its data are not only highlyreliable with clear-cut labels, but also have a strong social effect. To summarize the organhelp improve the reading efficiency . However,the official Microblog usually contains more information unrelated to the organization,which brings great challenges for event extraction and summary. The corpus-weighted and label-recognized model of official Microblog event summarization was proposed according to the characteristics of the official Microblog data, and keywords calculation method combined with the official relevant corpus was presented,providing a basic suppolog similarity calculation and event summarization. Experimental tests show that,compared with IF-IDF and TextRank method,CWRhave better performace in thematic term extraction precision rate P,the recall rate R and F value. And it achieved good results in thelater selecting weighted sentences for generating event summarization.
作者 高永兵 杨贵朋 张娣 GAO Yong-bing;YANG Gui-peng;ZHANG Di(Information Engineering School,Inner Mongolia University of Science and Technology,Baotou 014010, China)
出处 《内蒙古科技大学学报》 CAS 2017年第3期273-279,共7页 Journal of Inner Mongolia University of Science and Technology
基金 内蒙古自治区科学基金资助项目(2015MS0621)
关键词 官方微博 关键词提取 相似度 事件摘要 TextRank Official Microblog Keywords extraction Similarity Event summarization TextRank
  • 相关文献



  • 1张奇,黄萱菁,吴立德.一种新的句子相似度度量及其在文本自动摘要中的应用[J].中文信息学报,2005,19(2):93-99. 被引量:34
  • 2陈基漓,牛秦洲.基于特征码的网页去重[J].微计算机信息,2006,22(03X):113-115. 被引量:11
  • 3Tversky A. Features of Similarity [J]. Psychological Review, 1977,84(4) : 327-352.
  • 4Budanitsky A, Hirst G. Evaluating wordnet-based measures of lexical semantic relatedness [ J ]. Computational Linguistics, 2006,32(1) : 13-47.
  • 5Sussna M. Word sense disambiguation for free-text indexing using a massive semantic network[C]//Proceedings of the Second International Conference on Information and Knowledge Management(CIKM-93). Arlington,Virginia, 1993:67 74.
  • 6Corley C, Mihalcea R. Measuring the semantic similarity of texts [C]//Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment. Ann Arbor, MI, US, June 2005 : 13-18.
  • 7Fellbaum C. WordNet: An Electronic Lexical Database [M]. MIT Press, 1998.
  • 8Fleischman M, Hovy E. Multi-document person name resolution [C]// Harabagiu S, Farwell D, eds. Proceedings of the Work-shop on Reference Resolution and its Applications. Barcelona, Spain,July 2004:1 8.
  • 9Gurevych I, Strube M. Semantic similarity applied to spoken dia logue summarization[C]//Proceedings of the 20th International Conference on Computational Linguistics. Geneva, Switzerland, 2004:764-770.
  • 10Hassaa H, Hassan A, Emam O. Unsupervised information extraction approach using graph mutual reinforcement[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Sydney,Australia,July 2006: 501-508.











使用帮助 返回顶部