期刊文献+

TTP:一个面向中文新闻网页的主题时间解析器 被引量:7

TTP: a Topic Time Parser on Chinese News from Internet
下载PDF
导出
摘要 新闻中与主题相关的时态信息体现了新闻在时间维度的主题特征,在面向新闻信息处理的任务中,主题时间常常被用来构建和分析新闻话题模型,同时还可作为事件线索追踪话题演化.针对目前新闻信息处理研究中新闻主题与主题时间相关性差的问题,通过深入分析新闻报道类别与网页结构特征,挖掘新闻主题-时间关系模型,并在此模型基础上提出基于主题权重和无监督学习的主题时间抽取算法,实现了一个面向中文新闻网页的主题时间解析器,自动抽取主题时间并进行时态表达规范化处理.实验表明,该算法较同类方法具有更高的准确率,大大提高了新闻主题与主题时间的相关度,整个系统也取得了比较理想的性能评价. The topic-related temporal information reflects the topic feature on temporal dimensionality under the news text, so the topic time can be employed to establish and analyze the news topic model in several news-oriented tasks for web information processing, as well as be treated as the clue to the topic evolution tracking. Aiming to solve the problem that the news topic relates with the topic time weakly in the research on news information processing, a novel relation model on the news topic and the temporal information is constructed through analyzing features of news categories and the web structure in depth. According to the model, the extracting algo- rithm, which are based on weights comparison and unsupervised learning, for topic time are proposed. Meanwhile a topic time parser on Chinese news pages is implemented, which can extract and normalize topic time automatically. Experimental results show that the proposed algorithm contrasted to the similar methods has the higher precision, as well as improves the relativity between news topic and topic time. In addition the integrated system achieves the promising performance evaluation.
出处 《小型微型计算机系统》 CSCD 北大核心 2013年第5期1042-1049,共8页 Journal of Chinese Computer Systems
基金 国家"八六三"高技术研究发展计划项目(2009AA12Z204)资助 国家自然科学基金项目(60776801)资助 模式识别国家重点实验室开放课题项目(20090029)资助 北京市现代信息科学与网络重点实验室开放基金项目(XDXX1005)资助
关键词 中文信息处理 主题时间 新闻主题 信息抽取 Chinese information processing topic time news topic information extraction
  • 相关文献

参考文献2

二级参考文献25

  • 1吴高巍,陶卿,王珏.基于后验概率的支持向量机[J].计算机研究与发展,2005,42(2):196-202. 被引量:12
  • 2燕继坤,郑辉,王艳,曾立君.基于可信度的投票法[J].计算机学报,2005,28(8):1308-1313. 被引量:8
  • 3边肇琪,等编著.模式识别(第二版)[M].北京:清华大学出版社,2000.176-210.
  • 4Diettefich T G. Machine learning research: four current directions [J]. AI Magazine,1997,18(4) :97-136.
  • 5Salton G, Wong A, Yang C. A vector space model for automatic indexing [ J]. Communications of the ACM, 1975,18 ( 11 ) : 613- 620.
  • 6Catarina Silva, Bemardete Ribeiro. Margin-based active learning and background knowledge in text mining[ C]. Proceedings of the Fourth International Conference on Hybrid Intelligent Systems (HIS'04) ,IEEE Computer Society, Washington DC, 2004.
  • 7Bryll R,Gutierrez O R,Quek F. Attribute bagging: improving accuracy of classifier ensembles by using random features subsets [ J ]. Pattern Recognition Letters, 2003, 36(6) :1291-1302.
  • 8Langley P, Iba W. Average-case analysis of nearest neighbor algorithm[ A]. Proceedings of the Thirteenth International Joint Con ference on Artificial Intelligence[ C]. Morgan Kaufmann Publishers San Francisco USA, 1993,889-894.
  • 9http://www, sogou, com/labsddl/c, html,2008,9.
  • 10Paola Merlo, Suzanne Stevenson. Automatic Verb Classification Based on Statistical Distributions of Argument Structure[J]. Computational Linguistics. 2001.

共引文献25

同被引文献66

  • 1于满泉,骆卫华,许洪波,白硕.话题识别与跟踪中的层次化话题识别技术研究[J].计算机研究与发展,2006,43(3):489-495. 被引量:49
  • 2赵华,赵铁军,于浩,张姝.面向动态演化的话题检测研究[J].高技术通讯,2006,16(12):1230-1235. 被引量:17
  • 3贺瑞芳,秦兵,刘挺,潘越群,李生.基于依存分析和错误驱动的中文时间表达式识别[J].中文信息学报,2007,21(5):36-40. 被引量:21
  • 4刘玉葆,蔡嘉荣,印鉴,傅蔚慈.Clustering Text Data Streams[J].Journal of Computer Science & Technology,2008,23(1):112-128. 被引量:7
  • 5LINGUISTIC D C. ACE (Automatic content extraction) Chinese annotation guidelines for events [ S]. Version 5.5.1. 2005. https://www, ldc. upenn, edu/Projects/ ACE/.
  • 6BERBERICH K, BEDATHUR S, ALONSO O, et al. A language modeling approach for temporal information needs [ C ]//Proceedings of the 32nd European Conference on Information Retrieval. Berlin: Springer-Verlag, 2010:13-25.
  • 7LI Xiaowen, JIN Peiquan, ZHAO Xujian, et al. NTLM: a time-enhanced language model based ranking approach for web search [ C ]//Proceedings of Workshops on Web Information Systems Engineering. Berlin: Springer-Ver- lag, 2010 : 156-170.
  • 8LIN Sheng, JIN Peiquan, ZHAO Xujian, et al. Exploi- ting temporal information in Web search[J]. Expert Sys- tems with Applications, 2014, 41(2) : 331-341.
  • 9HE D, STOTT P D. Topic dynamics : an alternative mod- al of ' bursts' in stream of topics [C ]//Proceedings of the 16th ACM SIGKDD International Conference on Knowl- edge Discovery and Data Mining. New York: ACM, 2010:443-452.
  • 10ZHAO Xujian, JIN Peiquan, YUE Lihua. Automatic temporal expression normalization with reference time dy- namic-choosing [C ]//Proceedings of the 23rd Internation- al Conference on Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2010: 1498- 1506.

引证文献7

二级引证文献34

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部