摘要
话题关联识别用于判断新闻报道对流中每对中的两篇报道是否描述了同一个话题。为解决其中报道篇幅短小、稀疏问题严重及其内容存在漂移等问题,提出了一种动态信息扩充技术,用于改进报道表示模型。该技术用过去最新的话题相关报道来扩充当前报道,动态更新原有模型。此外,还研究了扩充信息的精化问题,通过有选择地加重一些重要特征的权重来减小扩充过程中噪音带来的影响。该方法在TDT4中的中文语料上进行了实验,结果表明动态信息扩充技术能够较大幅度地改进话题关联识别的性能,对多种特征采取的精化技术也对性能改进产生了较大影响。
Story Link Detection is to determine whether two stories are about the same topic. To overcome the limitation of the story length,sparse data and the drifting problem in story content, this paper provided a technology of dynamic information extending to improve the story representation model. It extended the current story with its previous latest topic-related story. The refinement on the information for dynamic extending was also studied. It aims to reduce the influence of the noise introduced when extending by increasing the weights of some important features in the extending story. This method was used for Story Link Detection on the TDT4 Chinese corpus. The experiment results indicate that the technology of dynamic extending and the refinement of extending information can both affect the performance of story link detection systems evidently.
出处
《计算机科学》
CSCD
北大核心
2009年第11期200-203,241,共5页
Computer Science
基金
国家自然科学基金资助项目(60403050)
新世纪优秀人才支持计划(NCET-06-0926)资助
关键词
话题关联识别
动态信息扩充
报道模型
Topic detection and tracking, Dynamic information extending, Story representation model