摘要
新闻中与主题相关的时态信息体现了新闻在时间维度的主题特征,在面向新闻信息处理的任务中,主题时间常常被用来构建和分析新闻话题模型,同时还可作为事件线索追踪话题演化.针对目前新闻信息处理研究中新闻主题与主题时间相关性差的问题,通过深入分析新闻报道类别与网页结构特征,挖掘新闻主题-时间关系模型,并在此模型基础上提出基于主题权重和无监督学习的主题时间抽取算法,实现了一个面向中文新闻网页的主题时间解析器,自动抽取主题时间并进行时态表达规范化处理.实验表明,该算法较同类方法具有更高的准确率,大大提高了新闻主题与主题时间的相关度,整个系统也取得了比较理想的性能评价.
The topic-related temporal information reflects the topic feature on temporal dimensionality under the news text, so the topic time can be employed to establish and analyze the news topic model in several news-oriented tasks for web information processing, as well as be treated as the clue to the topic evolution tracking. Aiming to solve the problem that the news topic relates with the topic time weakly in the research on news information processing, a novel relation model on the news topic and the temporal information is constructed through analyzing features of news categories and the web structure in depth. According to the model, the extracting algo- rithm, which are based on weights comparison and unsupervised learning, for topic time are proposed. Meanwhile a topic time parser on Chinese news pages is implemented, which can extract and normalize topic time automatically. Experimental results show that the proposed algorithm contrasted to the similar methods has the higher precision, as well as improves the relativity between news topic and topic time. In addition the integrated system achieves the promising performance evaluation.
出处
《小型微型计算机系统》
CSCD
北大核心
2013年第5期1042-1049,共8页
Journal of Chinese Computer Systems
基金
国家"八六三"高技术研究发展计划项目(2009AA12Z204)资助
国家自然科学基金项目(60776801)资助
模式识别国家重点实验室开放课题项目(20090029)资助
北京市现代信息科学与网络重点实验室开放基金项目(XDXX1005)资助
关键词
中文信息处理
主题时间
新闻主题
信息抽取
Chinese information processing
topic time
news topic
information extraction