FICW: Frequent Itemset Based Text Clustering with Window Constraint

FICW: Frequent Itemset Based Text Clustering with Window Constraint

下载PDF

导出

摘要 Most of the existing text clustering algorithms overlook the fact that one document is a word sequence with semantic information. There is some important semantic information existed in the positions of words in the sequence. In this paper, a novel method named Frequent Itemset-based Clustering with Window （FICW） was proposed, which makes use of the semantic information for text clustering with a window constraint. The experimental results obtained from tests on three （hypertext） text sets show that FICW outperforms the method compared in both clustering accuracy and efficiency. Most of the existing text clustering algorithms overlook the fact that one document is a word sequence with semantic information. There is some important semantic information existed in the positions of words in the sequence. In this paper, a novel method named Frequent Itemset-based Clustering with Window （FICW） was proposed, which makes use of the semantic information for text clustering with a window constraint. The experimental results obtained from tests on three （hypertext） text sets show that FICW outperforms the method compared in both clustering accuracy and efficiency.

作者 ZHOU Chong LU Yansheng ZOU Lei HU Rong

机构地区 College of Computer Science and Technology

出处《Wuhan University Journal of Natural Sciences》 CAS 2006年第5期1345-1351,共7页 武汉大学学报（自然科学英文版）

基金 Supported by the Natural Science Foundation ofHubei Province(ABA048)

关键词 text clustering frequent itemsets search engine text clustering frequent itemsets search engine

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献10

1Florian Beil,Martin Ester,Xiaowei Xu.Frequent term-based text clustering[].Procth International Conference on Knowledge Discovery and Data Mining (KDD)‘.2002
2Srikant,R,Agrawal,R. Mining Sequential Patterns: Generalizations and Performance Improvements . 1996
3Han Jiawei,Pei Jian,Yin Yiwen.Mining Frequent Patterns without Candidate Generation [ C] //[].Proceedings of the ACM SIGMOD International Conference on Management of Data.2000
4Zaiane R O,Antonie M L.Classifying Text Documents by Associating Terms with Text Categories [ C]//[].Proceedings of the th Australasian Conference on Database Technologies.2002
5Hearst M,Pedersen J O.Reexaming the Cluster Hypothesis:Scatter/Gather on Retrieval Results [C]//[].Proceedings of the th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.1996
6Zeng Huajun,He Qicai,Chen Zheng, et al.Learning to Cluster web search results [ C]//[].Proceedings of Sheffield SIGIR-th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.2004
7Steinbach M,Karypis G,Kumar V.A Comparison of Document Clustering Techniques [ C]//[].Proc KDD- Workshop TextMining.2000
8Hochbaum D S,Shmoys D B.A Best Possible Heuristic for the k-Center Problem[].Mathematics of Operations Re- search.1985
9Zhuang Ling,Dai Honghua.A Maximal Frequent Itemset Approach for Web Document Clustering [C]//[].International Conference on Computer and Information Technology( CIT ).2004
10Agrawal R S,Srikant R.Fast Algorithms for Mining Asso- ciation Rules in Large Database[].Proceedings of the th International Conference on Very Large Data Bases.1994

1李天瑞.Mining φ-Frequent Itemset Using FP-Tree[J].Journal of Modern Transportation,2001,18(1):67-74.
2赵康,陆介平,倪巍伟,王桂平.一种基于密度的文本聚类挖掘算法[J].计算机应用研究,2009,26(1):124-126. 被引量：4
3李向军,徐国华,刘立平.一种文本聚类算法[J].西北大学学报（自然科学版）,2005,35(2):155-158. 被引量：3
4郭昌建.基于BDIF的关联规则挖掘算法研究[J].唐山师范学院学报,2015,37(2):42-44.
5曹奇敏,郭巧,吴向华.Similarity matrix-based K-means algorithm for text clustering[J].Journal of Beijing Institute of Technology,2015,24(4):566-572.
6曹英.一种主动CKS算法[J].福建电脑,2009,25(7):9-10.
7宋威,刘朋.基于频繁项集与协同过滤的混合推荐方法[J].山西大学学报（自然科学版）,2017,40(1):35-43. 被引量：1
8唐德权,夏耀稳,朱林立,夏幼明.基于有向图的关联规则挖掘算法研究[J].云南大学学报（自然科学版）,2006,28(S2):119-121. 被引量：5
9胡慧蓉.基于关系矩阵的多层次关联规则快速挖掘[J].科技信息,2009(6):83-83.
10王刚,钟国祥.一种基于本体相似度计算的文本聚类算法研究[J].计算机科学,2010,37(9):222-224. 被引量：10

Wuhan University Journal of Natural Sciences

2006年第5期

浏览历史

内容加载中请稍等...

FICW: Frequent Itemset Based Text Clustering with Window Constraint

参考文献10

相关作者

相关机构

相关主题

浏览历史