期刊文献+

话题关联识别中报道信息的动态扩充研究 被引量:2

Research on the Dynamic Extending of Story in Story Link Detection
下载PDF
导出
摘要 话题关联识别用于判断新闻报道对流中每对中的两篇报道是否描述了同一个话题。为解决其中报道篇幅短小、稀疏问题严重及其内容存在漂移等问题,提出了一种动态信息扩充技术,用于改进报道表示模型。该技术用过去最新的话题相关报道来扩充当前报道,动态更新原有模型。此外,还研究了扩充信息的精化问题,通过有选择地加重一些重要特征的权重来减小扩充过程中噪音带来的影响。该方法在TDT4中的中文语料上进行了实验,结果表明动态信息扩充技术能够较大幅度地改进话题关联识别的性能,对多种特征采取的精化技术也对性能改进产生了较大影响。 Story Link Detection is to determine whether two stories are about the same topic. To overcome the limitation of the story length,sparse data and the drifting problem in story content, this paper provided a technology of dynamic information extending to improve the story representation model. It extended the current story with its previous latest topic-related story. The refinement on the information for dynamic extending was also studied. It aims to reduce the influence of the noise introduced when extending by increasing the weights of some important features in the extending story. This method was used for Story Link Detection on the TDT4 Chinese corpus. The experiment results indicate that the technology of dynamic extending and the refinement of extending information can both affect the performance of story link detection systems evidently.
作者 张晓艳 王挺
出处 《计算机科学》 CSCD 北大核心 2009年第11期200-203,241,共5页 Computer Science
基金 国家自然科学基金资助项目(60403050) 新世纪优秀人才支持计划(NCET-06-0926)资助
关键词 话题关联识别 动态信息扩充 报道模型 Topic detection and tracking, Dynamic information extending, Story representation model
  • 相关文献

参考文献11

  • 1James A, et al. Introduction to Topic Detection and Tracking in Topic Detection and Tracking: Event-based Information Organization[M]. Kluwer Academic Publishers, 2002 : 1-16.
  • 2Wayne,Charles L. Topic Detection and Tracking (TDT) : Overview & Perspective[C] // Proceedings of the Broadcast News Transcription and Understanding Workshop. Lansdowne, Virginia, 1998.
  • 3Margaret C,Ao F,Giridhar K,et al. UMass at TDT 2004[C]// Proceedings of the 7th Topic Detection and Tracking (TDT2004). Gaithersbury, 2004.
  • 4Franeine C, Ayman F, Thorsten F. Multiple Similarity Measures and Source-Pair Information in Story Link Detection[C] // HLTNAACL 2004. Boston, 2004 : 313-320.
  • 5Victor L,James A, Edward D, et al. Relevance models for topic detection and tracking[C] //Proceedings of Human Language Technology Conference (HLT). California, 2002.
  • 6Ramesh N. Semantic language models for topic detection and tracking[C] // Proceedings of the HLT-NAACL 2003 student research workshop. Edmonton, 2003.
  • 7Chirag S, Bruce C W, David J. Representing documents with named entities for story link detection (SLD)[C]//CIKM 2006. Virginia, 2006.
  • 8Zhang Xiaoyan, Wang Ting, Chen Huowang. Stroy Link Detection based on Dynamic Information Extending[C]//Proceedings of the International Conference on The Third International Joint Conference on Natural Language Processing (IJCNLP2008). Hyderabad, 2008.
  • 9Thorsten B, Francine C, Ioannis T. Topic-Based Document Segmentation with Probabilistic Latent Semantic analysis[C]//Proceedings of the International conference on Information and Knowledge Management (CIKM). McLean, 2002.
  • 10The 2003 Topic Detection and Tracking (TDT2003) Task Definition and Evaluation Plan [OL]. http: //www. nist. gov/ speech/tests/tdt/tdt2003/evalplan. htm.

同被引文献18

  • 1贾自艳,何清,张海俊,李嘉佑,史忠植.一种基于动态进化模型的事件探测和追踪算法[J].计算机研究与发展,2004,41(7):1273-1280. 被引量:58
  • 2宋丹,王卫东,陈英.基于改进向量空间模型的话题识别与跟踪[J].计算机技术与发展,2006,16(9):62-64. 被引量:23
  • 3赵华,赵铁军,于浩,张姝.面向动态演化的话题检测研究[J].高技术通讯,2006,16(12):1230-1235. 被引量:17
  • 4KUMARAN G, ALLAN J. Text classification and named entities for new event detection [ C]// Proceedings of the 27th Annual Interna- tional ACM SIGIR Conference on Research and Development in In- formation Retrieval. New York: ACM Press, 2004:297 -304.
  • 5LEE C, LEE G G, JANG M. Dependency structure language model for topic detection and tracking [ J]. Information Processing and Management, 2007, 43(5): 1249-1259.
  • 6van der WALT C, BARNARD E. Data characteristics that deter- mine classifier performance [ J]. SAIEE Africa Research Journal, 2007, 98(3): 87-93.
  • 7GARCIA E. Description advantages and limitations of the classic vector space model [ EB/OL]. [2012-03-25]. http://www, miisl- ita. com/term-vector/term-vector-3, html.
  • 8LARKEY L S, FENG F F, CONNELL M, et al. Language-specif- ic models in muhilingual topic tracking [ C]// Proceedings of the 27th Annum International Conference on Research and Develop- ment in Information Retrieval. New York: ACM Press, 2004:402 - 409.
  • 9CHEN F, FARAHAT A, BRANTS T. Muhiple similarity measures and source-pair information in story link detection [ C]//Proceed- ings of the Human Language Technology Conference of the North A- merican Chapter of the Association for Computational Linguistics. Boston: Association for Computational Linguistics, 2004:313 - 320.
  • 10张晓艳,王挺,陈火旺.基于多向量和实体模糊匹配的话题关联识别[J].中文信息学报,2008,22(1):9-14. 被引量:5

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部