摘要
自1996年话题发现与跟踪评测启动以来,该研究受到普遍关注,取得巨大进步,也遇到诸多困难。通过分析大量话题数据,提出层次化话题与层次聚类的区别在于话题的层次是由事件的构成决定的,层次化话题应当分为三层,即微类、中类和上类。原因在于计算机自动分析产生的层次化话题必须与现实世界有客观的联系。据此提出一个面向大规模真实数据的有充分理论依据的层次化话题发现与跟踪方法,并在集群系统上予以实现。
Since 1996,topic detection and tracking has obtained extensive attention and has encountered great challenge when making great progress. By analyzing mass data, the differences between hierarchical topic and hierarchical clustering are firstly proposed, which should be decided by the construction of event and be represented as three layers, for hierarchical topic produced by computer automatically has external relation with the real world. Then an algorithm for hierarchical topic detection and tracking that can process large-scale data are proposed and implemented on our clusters computer.
出处
《广西师范大学学报(自然科学版)》
CAS
北大核心
2007年第2期157-160,共4页
Journal of Guangxi Normal University:Natural Science Edition
基金
国家863计划资助项目(2005AA147030)
国家242信息安全计划资助项目(2005A37)
北京市教育委员会科技发展计划面上项目(KM200600006002)
关键词
话题发现与跟踪
层次化话题识别
层次化话题跟踪
多层聚类
事件结构
topic detection and tracking
hierarchical topic detection
hierarchical topic tracking
multilayered clustering
event structure