摘要
传统方法将事件检测任务看作分类问题,将词作为实例来训练分类器,容易导致训练正反例不平衡,同时,在语料库规模较小时存在一定的数据稀疏问题。首先避开以词为实例进行分类,在事件类别判断上引入聚类思想,在事件触发词的指导下,采用自相似度对K-means聚类算法中的K值进行自收敛,优化了聚类算法。然后结合命名实体及其位置信息,对事件类别进行详细定位,很好地解决了传统事件检测对类别模板的依赖性,所检测的事件在文本摘要、检索和主题检测与追踪上得到了很好的应用。
Traditional method of Event Detection and Characterization (EDC) regards event detection task as classificalion problem. It makes words as samples to train classifier, which can lead to positive and negative samples of classifier imbalance. Meanwhile, there is data sparseness problem of this method when the corpus is small. This paper didn't classify event using word as samples, but clustered event in judging event types. It adapted self-similarity to convergence the value of Kin K-means algorithm by the guidance of event triggers, and optimized clustering algorithm. hhen, combining with named entity and its comparative position information, the new method further ensures the pinpoint type of event.The new method avoids depending on template of event in tradition methods, and its result of event detection can well be used in automatic text summarization, text retrieval, and topic detection and tracking.
出处
《计算机科学》
CSCD
北大核心
2010年第3期212-214,220,共4页
Computer Science
基金
863国家重点基金项目(2007AA01Z439)资助
关键词
事件检测
触发词
自相似度
命名实体
聚类
Event detection, Trigger, Self-similarity, Named entity, Clustering