期刊文献+

基于主题的舆情跟踪方法研究及性能评价 被引量:4

Research and Performance Evaluation on the Theme Based Method for the Public Opinion Tracking
原文传递
导出
摘要 舆情跟踪是对媒体信息流中的热点话题进行实时追踪,是近年来自然语言处理领域的研究热点。实现该任务的核心技术是进行文本分类,运用信息增益以及互信息计算特征项权重,提取向量空间模型中文档表示的有效特征;分别采用Rocchio、K-Nearest Neighbor(KNN)、Bayes方法对于给定主题的事件实现舆情跟踪。在测试集上的最优性能F-Measure值达到86.2%。舆情跟踪在信息安全等领域具有广阔的应用前景,为用户及时判断网络热点事件的发展趋势提供有效指导依据。 The aim of the public opinion tracking is to make tracks for the progress of the appointed hot topic in the information flow of the media, and this has becomes the hotspot research direction in the field of natural language processing in recent years. The key technique to achieve the task is text classification. The authors adopt different methods of information gain and mutual information for the feature selection within the vector space model. They are used for the weight calculation and the effective features with higher weight values are extracted. The approach of Rocchio, KNN and Bayes are adopted to implement the public opinion tracking on a given topic events. Finally, the authors give the statistical data analysis and achieve the performance of 86.2% F-Measure on the test set. Public opinion tracking has a broad application prospect in the areas of information security and so on. It provides the effective guidance for the determination to the development trend of the network hot events.
出处 《图书情报工作》 CSSCI 北大核心 2012年第18期50-53,109,共5页 Library and Information Service
基金 国家自然科学基金青年基金项目"问答式信息检索中信息抽取技术研究"(项目编号:60803086) 北京市自然科学基金项目"语义蕴涵推理技术及在问答式信息检索中的应用研究"(项目编号:4123091)研究成果之一
关键词 舆情跟踪 文本分类 自然语言处理 public opinion tracking text classification natural language processing
  • 相关文献

参考文献10

  • 1James A, Jaime C, George D, et al. Topic detection and tracking pilot study : Final report [ C ]//Proceedings of the DARPA Broad- cast News Transcription and Understanding Workshop. San Fran- cisco : Morgan Kaufmann Publishers Inc, 1998 : 194 - 218.
  • 2Zhang Kuo , Li Juanzi, Wu Gang. New event detection based on indexing tree and named entity[ C ]//Proceedings of the 30th An- nual International ACM SIGIR Conference on Research and Devel- opment in Information Retrieval Amsterdam: ACM Press,2007:215 -222.
  • 3Juha M, Helena A, Marko S. Simple semantics in topic detection and tracking [ J ]. Information Retrieval, 2004,7 ( 3/4 ) : 347 - 368.
  • 4Brants T, Chen F, Farahat A O. A system for new event detection [ C]//Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York : ACM Press, 2003:330 - 337.
  • 5Ji H, Grishman R. Refining event extraction through cross-docu- ment onference [ C ]//Moore J D, Teufel S, Allan J, et al. The 46th Annual Meeting of the Association for Computational Linguis- tics: Human Language Technologies. Columbus: ACM Press, 2008 : 254 - 262.
  • 6Patwardhan S, Riloff E. Eftective information extraction with se- mantic affinity patterns and relevant regions [ C ]//Proceedings ofJoint Conference on Empirical Methods in Natural Language Pro- cessing and Computational Natural Language Learning. Prague:As- sociation for Computational Linguistics Press, 2007:717 -727.
  • 7Czech R, Patwardhan S, Riloff E. A unified model of phrasal and sentential evidence for information extraction [ C ]//Proceedings of Conference on Empirical Methods in Natural Language Processing ( EMNLP - 09 ). Stroudsburg: Association for Computational Lin- guistics Press, 2009 : 151 - 160.
  • 8王会珍,朱靖波,季铎,叶娜,张斌.基于反馈学习自适应的中文话题追踪[J].中文信息学报,2006,20(3):92-98. 被引量:17
  • 9于满泉,骆卫华,许洪波,白硕.话题识别与跟踪中的层次化话题识别技术研究[J].计算机研究与发展,2006,43(3):489-495. 被引量:49
  • 10王煜,白石,王正欧.用于Web文本分类的快速KNN算法[J].情报学报,2007,26(1):60-64. 被引量:33

二级参考文献33

  • 1王煜,王正欧.基于模糊决策树的文本分类规则抽取[J].计算机应用,2005,25(7):1634-1637. 被引量:13
  • 2T, Brants, F, R, Chen, A, O, Farahat. A system for new event detection. In: Proc, SIGIR 2003, the 26th Annual lnt'l ACM SIGIR Conf. Research and Development in Information Retrieval.New York: ACM Press, 2003. 330-337.
  • 3R. Swan, J. Allan. Automatic generation of overview timelines.ACM SIGIR, Research and Development in Information Retrieval, Athans, Greece, 2000.
  • 4F. Fukumoto, Y. Suzuki. Event tracking based on domain dependency. ACM SIGIR, Research and Development in Information Retrieval, Athans, Greece, 2000.
  • 5David A. Smith. Detecting and browsing events in unstructured text. The 25th Annual ACM SIGIR Conf., Finland, 2002.
  • 6R. Papka. On-line new event detection, clustering and tracking:[Ph, D. dissertation]. Massachusetts: Department of Computer Science, University of Massachusetts, 1999.
  • 7Ying-Ju Chen, Hsin Hsi. NLP and IR approaches to monolingual and multilingual link detection, The 19th Int'l Conf.Computational Linguistics, Taipei, Taiwan, 2002.
  • 8J. Allan, Ao Feng, Alvaro Bolivar, Flexible intrinsic evaluation of hierarchical clustering for TDT. The 12th ACM Int'l Conf.Information and Knowledge Management (CIKM 2003 ),Louisiana, USA, 2003.
  • 9J, Allan, Topic Detection and Tracking; Event-Based Information Retrieval, Norvell, MA, USA; Kluwer Aeademic Publishers,2002.
  • 10NIST. The 2004 topic detection and tracking (TDT 2004) task definition and evaluation plan. Version 1.2, National Institute of Suandards and Technology, Teeh. Rep., 2004.

共引文献94

同被引文献43

引证文献4

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部