摘要
一个文档往往包含多个主题的事件,把分散在多个文本中的同一主题事件组织起来依靠传统的文本聚类是无法实现的.本文通过对已有的CURE算法进行分析,根据事件的特征,对代表点的选取和小类合并机制进行改进,实现了一个改进的CURE算法.实验结果表明:改进后的方法在保证执行效率的情况下取得了更好的聚类效果.
A document commonly contains many events with different topics, so it' s really hard for tradition- al clustering algorithms to organize such events with the same topic in multi - documents. Through the analy- sis of the feature of traditional CURE algorithm, and according to the feature of the events. This paper pro- poses an improved CURE algorithm that improved the selecting of representative points and clusters nesting mechanism. The experimental results show that our approach can provide better performance than that of other methods.
出处
《重庆文理学院学报(社会科学版)》
2015年第5期121-124,共4页
Journal of Chongqing University of Arts and Sciences(Social Sciences Edition)
基金
安徽省级质量工程项目(2013TSZY088)
关键词
层次聚类
CURE
代表点
事件聚类
hierarchical clustering
CURE
representative points
event clustering