An Incremental Algorithm of Text Clustering Based on Semantic Sequences 被引量：1

An Incremental Algorithm of Text Clustering Based on Semantic Sequences

下载PDF

导出

摘要 This paper proposed an incremental textclustering algorithm based on semantic sequence. Using similarity relation of semantic sequences and calculating the cover of similarity semantic sequences set, the candidate cluster with minimum entropy overlap value was selected as a result cluster every time in this algorithm. The comparison of experimental results shows that the precision of the algorithm is higher than other algorithms under same conditions and this is obvious especially on long documents set. This paper proposed an incremental textclustering algorithm based on semantic sequence. Using similarity relation of semantic sequences and calculating the cover of similarity semantic sequences set, the candidate cluster with minimum entropy overlap value was selected as a result cluster every time in this algorithm. The comparison of experimental results shows that the precision of the algorithm is higher than other algorithms under same conditions and this is obvious especially on long documents set.

作者 FENG Zhonghui SHEN Junyi BAO Junpeng

机构地区 Institute of Computer Software

出处《Wuhan University Journal of Natural Sciences》 CAS 2006年第5期1340-1344,共5页 武汉大学学报（自然科学英文版）

基金 Supported by the National Natural Science Funda-tion of China (60173058)

关键词 text clustering semantic sequence ENTROPY text clustering semantic sequence entropy

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献10

1Jain,A K,Dubes,R C. Algorithms for Clustering Data . 1988
2Kaufman,L,Rousseeuw,P J. Finding Groups in Data: An Introduction to Cluster Analysis . 1990
3Raymond,T N. Efficient and Effective Clustering Methods for Spatial Data Mining . 1994
4Zhang T,,Ramakrishnan R,Livny M.Birch: An Efficient Data Clustering Method for Large Databases[].// Proceedings of ACM SIGMOD International Conference on Management of Data.1996
5Guha S,,Rastogi R,Shim K.CURE: An Efficient Clustering Algorithm for Large Databases[].Information System Journal.2001
6Karypis G,Han E-H,Kumar V.CHAMELEON: A hierarchical clustering algorithm using dynamic modeling[].IEEE Computer.1999
7Boley D,,Gini M,Gross R,et al.Partitioning-basedclustering for web document categorization[].Deci-sion Support Systems.1999
8Zamir O,Etzioni O.Web Document Clustering: A Feasibility DemonstrationResearch and Development in Information Retrieval[].// Proceedings of the th ACM SIGIR Conference on Research and Development in Information Retrieval.1998
9Dhillon I S,Guan Y,Kogan J.Co-clustering Documents and Words using Bipartite Spectral Graph Partitioning [ C ][].//Proceedings of the th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.2001
10Beil F,,Ester M,Xu X W.Frequent Term-Based Text Clustering[].// Proceedings of the th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.2002

同被引文献8

1何明,冯博琴,马兆丰,傅向华.基于熵和信息粒度的粗糙集聚类算法[J].西安交通大学学报,2005,39(4):343-346. 被引量：6
2徐泉清,朱玉文,李亮,刘万春.一种结合粗糙集和Cobweb的聚类器[J].计算机应用,2005,25(6):1350-1352. 被引量：2
3马文萍,尚荣华,焦李成.免疫克隆优化聚类技术[J].西安电子科技大学学报,2007,34(6):911-915. 被引量：8
4陈新泉.面向数据流的加权聚类及演化分析研究[J].世界科技研究与发展,2008,30(6):807-811. 被引量：1
5詹磊,唐爱华.基于多维加权聚类的雷达信号分选方法[J].遥测遥控,2007,28(S1):113-117. 被引量：4
6阳琳赟,周海京,卓晴,王文渊.基于属性重要性的加权聚类融合[J].计算机科学,2009,36(4):243-245. 被引量：12
7尚海昆,孟建良.基于k-means的加权聚类算法研究[J].中国电力教育,2008(S3):390-392. 被引量：2
8黄定轩.基于客观信息熵的多因素权重分配方法[J].系统工程理论方法应用,2003,12(4):321-324. 被引量：65

引证文献1

1李湘英,熊炎,刘道华,曾召霞.软距离计算的地震聚类方法[J].计算机应用研究,2011,28(4):1299-1300.

1邵立松,戴华东,孔金珠,张菁.小时间尺度网络拥塞研究[J].计算机应用研究,2012,29(1):278-281.
2赵卫东,李旗号.粗集在数据开采中的应用[J].系统工程学报,2002,17(4):349-357. 被引量：6
3赵康,陆介平,倪巍伟,王桂平.一种基于密度的文本聚类挖掘算法[J].计算机应用研究,2009,26(1):124-126. 被引量：4
4李向军,徐国华,刘立平.一种文本聚类算法[J].西北大学学报（自然科学版）,2005,35(2):155-158. 被引量：3
5王彦祺.用“递增”算法求完全图的所有哈密顿回路[J].计算机应用与软件,2004,21(11):79-81. 被引量：2
6曹奇敏,郭巧,吴向华.Similarity matrix-based K-means algorithm for text clustering[J].Journal of Beijing Institute of Technology,2015,24(4):566-572.
7王苑,徐德智,陈建二.复杂中文文本的实体关系抽取研究[J].计算机科学,2009,36(8):208-211. 被引量：1
8王刚,钟国祥.一种基于本体相似度计算的文本聚类算法研究[J].计算机科学,2010,37(9):222-224. 被引量：10
9石旺,杨英杰,唐慧林,董丽鹏.基于协议语义序列的应用层交互行为异常检测[J].计算机应用研究,2015,32(10):3060-3064. 被引量：2
10PENGDun-lu,QIUYang.Fast Discovering Frequent Patterns for Incremental XML Queries[J].Wuhan University Journal of Natural Sciences,2004,9(5):638-646.

Wuhan University Journal of Natural Sciences

2006年第5期

浏览历史

内容加载中请稍等...

An Incremental Algorithm of Text Clustering Based on Semantic Sequences 被引量：1

参考文献10

同被引文献8

引证文献1

相关作者

相关机构

相关主题

浏览历史