期刊文献+

基于多策略优化的分治多层聚类算法的话题发现研究 被引量:38

The Study of Topic Detection Based on Algorithm of Division and Multi-level Clustering with Multi-strategy Optimization
下载PDF
导出
摘要 话题发现与跟踪是一项评测驱动的研究,旨在依据事件对语言文本信息流进行组织利用。自1996年提出以来,该研究得到了越来越广泛的关注。本文在研究已有成熟算法的基础上,提出了基于分治多层聚类的话题发现算法,其核心思想是把全部数据分割成具有一定相关性的分组,对各个分组分别进行聚类,得到各个分组内部的话题(微类),然后对所有的微类再进行聚类,得到最终的话题,在聚类的过程中采用多种策略进行优化,以保证聚类的效果。基于该算法的系统在TDT4中文语料上进行了测试,结果表明该算法属于目前结果最好的算法之一。 Topic Detection and Tracking is a research driven by evaluation, which intends to organize and utilize information stream of texts according to event. Since being brought forward in 1996,it comes under more and more attention. This paper an algorithm of division and multi-level clustering with multi-strategy optimization, which bases on study of today's mature algorithms. The core thought of the algorithm is to divide all data into groups (each group has intrinsic relevance),and cluster in each group to produce micro-dusters,and then cluster on all micro-clusters to result in final topics. During the process, various strategies are employed to improve the effect of clustering. The system implemented with the algorithm has been tested on TDT4 corpus. The test indicates the algorithm is one tin,sent best algorithm.
出处 《中文信息学报》 CSCD 北大核心 2006年第1期29-36,共8页 Journal of Chinese Information Processing
基金 国家973资助项目(2004CB318109)
关键词 计算机应用 中文信息处理 话题发现与跟踪 分治多层聚类 系统聚类 computer application Chinese information processing topic detection and tracking division and multi-level clustering hierarchical clustering
  • 相关文献

参考文献6

  • 1骆卫华 刘群 程学旗 孙茂松 陈群秀.话题检测与跟踪技术的发展与研究[A].孙茂松,陈群秀.全国计算语言学联合学术会议(JSCL-2003)论文集[C].北京:清华大学出版社,2003.560-566.
  • 2Jonathan G. Fiscus, George R. Doddington. Topic Detection and Tracking Evaluation Overview[A]. In: James Allan.Topic Detection and Tracking, Event-based Infommtion Organization[C]. Norwell: Kluwer Academic Publishers,2002,17 - 31.
  • 3Y.Yang, T. Pierce, J. Carbonell. A Study on Retrospective and Online Event Detection[A]. In: W. Bruce Croft,Alistair Moffat,C. J.van Rijsbergen, et al. Proceedings of the 21th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'98)[C]. New York: ACM Press, 1998, 28- 36.
  • 4Brants, T., Chen, F. R., Farahat, A. O. A system for new event detection[A].in: Charles Clarke, et al. Proceedings of SIGIR 2003, the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval[C]. New York: ACM Press,2003,330- 337.
  • 5Juha Makkonen, Helena Ahonen-Myka, and Marko Salmenkivi. Simple Semantics in Topic Detection and Tracking[J]. Information Retrieval, 2004, 7 (3-4): 347- 368.
  • 6Y. Yang, J. Carbonell, C. Jin. Topic-conditioned novelty detection[A] .In: Hand D, et al. Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining[C]. New York: ACM Press ,2002,688 - 693.

共引文献4

同被引文献486

引证文献38

二级引证文献446

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部