期刊文献+

FICW: Frequent Itemset Based Text Clustering with Window Constraint

FICW: Frequent Itemset Based Text Clustering with Window Constraint
下载PDF
导出
摘要 Most of the existing text clustering algorithms overlook the fact that one document is a word sequence with semantic information. There is some important semantic information existed in the positions of words in the sequence. In this paper, a novel method named Frequent Itemset-based Clustering with Window (FICW) was proposed, which makes use of the semantic information for text clustering with a window constraint. The experimental results obtained from tests on three (hypertext) text sets show that FICW outperforms the method compared in both clustering accuracy and efficiency. Most of the existing text clustering algorithms overlook the fact that one document is a word sequence with semantic information. There is some important semantic information existed in the positions of words in the sequence. In this paper, a novel method named Frequent Itemset-based Clustering with Window (FICW) was proposed, which makes use of the semantic information for text clustering with a window constraint. The experimental results obtained from tests on three (hypertext) text sets show that FICW outperforms the method compared in both clustering accuracy and efficiency.
出处 《Wuhan University Journal of Natural Sciences》 CAS 2006年第5期1345-1351,共7页 武汉大学学报(自然科学英文版)
基金 Supported by the Natural Science Foundation ofHubei Province(ABA048)
关键词 text clustering frequent itemsets search engine text clustering frequent itemsets search engine
  • 相关文献

参考文献10

  • 1Florian Beil,Martin Ester,Xiaowei Xu.Frequent term-based text clustering[].Procth International Conference on Knowledge Discovery and Data Mining (KDD)‘.2002
  • 2Srikant,R,Agrawal,R. Mining Sequential Patterns: Generalizations and Performance Improvements . 1996
  • 3Han Jiawei,Pei Jian,Yin Yiwen.Mining Frequent Patterns without Candidate Generation [ C] //[].Proceedings of the ACM SIGMOD International Conference on Management of Data.2000
  • 4Zaiane R O,Antonie M L.Classifying Text Documents by Associating Terms with Text Categories [ C]//[].Proceedings of the th Australasian Conference on Database Technologies.2002
  • 5Hearst M,Pedersen J O.Reexaming the Cluster Hypothesis:Scatter/Gather on Retrieval Results [C]//[].Proceedings of the th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.1996
  • 6Zeng Huajun,He Qicai,Chen Zheng, et al.Learning to Cluster web search results [ C]//[].Proceedings of Sheffield SIGIR-th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.2004
  • 7Steinbach M,Karypis G,Kumar V.A Comparison of Document Clustering Techniques [ C]//[].Proc KDD- Workshop TextMining.2000
  • 8Hochbaum D S,Shmoys D B.A Best Possible Heuristic for the k-Center Problem[].Mathematics of Operations Re- search.1985
  • 9Zhuang Ling,Dai Honghua.A Maximal Frequent Itemset Approach for Web Document Clustering [C]//[].International Conference on Computer and Information Technology( CIT ).2004
  • 10Agrawal R S,Srikant R.Fast Algorithms for Mining Asso- ciation Rules in Large Database[].Proceedings of the th International Conference on Very Large Data Bases.1994

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部