期刊文献+

基于触发词指导的自相似度聚类事件检测 被引量:12

Self-similarity Clustering Event Detection Based on Triggers Guidance
下载PDF
导出
摘要 传统方法将事件检测任务看作分类问题,将词作为实例来训练分类器,容易导致训练正反例不平衡,同时,在语料库规模较小时存在一定的数据稀疏问题。首先避开以词为实例进行分类,在事件类别判断上引入聚类思想,在事件触发词的指导下,采用自相似度对K-means聚类算法中的K值进行自收敛,优化了聚类算法。然后结合命名实体及其位置信息,对事件类别进行详细定位,很好地解决了传统事件检测对类别模板的依赖性,所检测的事件在文本摘要、检索和主题检测与追踪上得到了很好的应用。 Traditional method of Event Detection and Characterization (EDC) regards event detection task as classificalion problem. It makes words as samples to train classifier, which can lead to positive and negative samples of classifier imbalance. Meanwhile, there is data sparseness problem of this method when the corpus is small. This paper didn't classify event using word as samples, but clustered event in judging event types. It adapted self-similarity to convergence the value of Kin K-means algorithm by the guidance of event triggers, and optimized clustering algorithm. hhen, combining with named entity and its comparative position information, the new method further ensures the pinpoint type of event.The new method avoids depending on template of event in tradition methods, and its result of event detection can well be used in automatic text summarization, text retrieval, and topic detection and tracking.
出处 《计算机科学》 CSCD 北大核心 2010年第3期212-214,220,共4页 Computer Science
基金 863国家重点基金项目(2007AA01Z439)资助
关键词 事件检测 触发词 自相似度 命名实体 聚类 Event detection, Trigger, Self-similarity, Named entity, Clustering
  • 相关文献

参考文献8

  • 1ACE(Automatic Content Extraction) Chinese Annotation Gui - delines for Events [M]. National Institute of Standards and Technology, 2005.
  • 2Surdeanu M, Harabagiu S, Williams J, et al. Using Predicate-Argument Structures for Information Extraction[C]// Proceedings of ACL. 2003,8-15.
  • 3Surdeanu M, Harabagiu S. Infrastructure for open-domain information extraction [C]//Proceedings of the Human Language Technology Conference. 2002 : 325-330.
  • 4Chieu Hal Leong, Ng Hwee Tou. A Maximum entropy Ap - proach to Information Extraction from Semi-Structured and Free Text[C]//Proceedings of the 18th National Conference on Artificial Intelligence. 2002:786-791.
  • 5Ahn D. The Stages of Event Extraction[C]//Proceedings of the Workshop on Annotations and Reasoning about Time and Events. 2006 : 1-8.
  • 6赵妍妍,秦兵,车万翔,刘挺.中文事件抽取技术研究[J].中文信息学报,2008,22(1):3-8. 被引量:106
  • 7Ding C, He Xiaofeng. Cluster Merging and Splitting in Hierarchical Clustering Algorithms [A] // Proceedings of the 2002 IEEE International Conference on Data Mining[C]. Maebashi City,Japan: Maebashi TERRSA, 2002 : 139-146.
  • 8Ding C, He X, Zha H, et al. A Min-Max Cut Algorithm for Graph Partitioning and Data Clustering[A]//Proceedings of the IEEE Internationl Conference [C]. San Jose, California, USA:Data Mining,2001 ; 107-114.

二级参考文献9

  • 1Naomi Daniel,Dragomir Radev and Timothy Allison.Sub-event based Multi-document Summarization[A].In:Proceedings of the HLT-NAACL Workshop on Text Summarization[C].2003.9-16.
  • 2Elena Filatova and Vasileios Hatzivassiloglou.Event-based Extractive summarization[A].In:Proceedings of ACL Workshop on Summarization[C]].2004.104-111.
  • 3Wenjie Li,Mingli Wu and Qin Lu.Extractive Summarization using Inter-and Intra-Event Relevance[A].In:Proceedings of the 44th Annual Meeting of the Association for Computational Liguistics[C].2006.369-376.
  • 4David Ahn.The stages of event extraction[A].In:Proceedings of the Workshop on Annotations and Reasoning about Time and Events[C].2006.1-8.
  • 5ACE (Automatic Content Extraction) Chinese Annotation Guidelines for Events.National Institute of Standards and Technology[R].2005.
  • 6Mihai Surdeanu,Sanda Harabagiu,John Williams,et al.Using Predicate-Argument Structures for Information Extraction[A].In:Proceedings of ACL[C].2003.8-15.
  • 7Mihai Surdeanu and Sanda Harabagiu.Infrastructure for Open-Domain Information Extraction[A].In:Proceedings of the Human Language Technology Conference[C].2002.325-330.
  • 8Hai Leong Chieu,Hwee Tou Ng.A Maximum Entropy Approach to Information Extraction from SemiStructured and Free Text[A].In:Proceedings of the 18th National Conference on Artificial Intelligence[C].2002.786-791.
  • 9来自ACE标准标注结果,分别对应着ACE的三项标注任务:实体识别、时间表达式识别和属性词识别.

共引文献105

同被引文献144

引证文献12

二级引证文献46

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部