期刊文献+

基于WEKA平台的文本聚类研究与实现 被引量:1

The Research and Implementation of Text Clustering Based on WEKA
下载PDF
导出
摘要 文本聚类是文本挖掘领域的一个重要研究分支,是聚类方法在文本处理领域的应用。本文首先对基于空间向量模型的文本聚类过程做了较深入的讨论和总结。另外,本文回顾了现有的文本聚类算法,以及常用的文本聚类效果评价指标。在研究了已有成果的基础上,本文利用20Newsgroup文本语料库,针对向量空间表示模型,在开源的数据挖掘平台WEKA上实现了文本预处理和k-means聚类算法,并根据实际聚类效果,就文本表示、特征选择、特征降维等方面提出优化方案。 Text clustering, one of the most important research braches of text mining, is the application of clustering algorithm in text processing, Firstly, this paper makes relatively deep discussion and summary in the field of VSM-based text clustering process. Moreover, it also discusses with the text clustering algorithm and introduces basic knowledge of clustering validity. On the basis of these works, by doing research with the open source corpus of 20 Newsgroup, this paper implements text preprocessing and k-means clustering algorithm based on the open source data mining tool of WEKA. According to the effects of clustering of the corpus, it presents optimization of text clustering algorithm, including feature representation, dimensionality reduction etc.optimizations of text clustering algorithm, including feature representation, dimensionalitv reduction etc.
作者 陈嘉勇
出处 《中国管理信息化》 2009年第21期9-12,共4页 China Management Informationization
关键词 文本挖掘 文本聚类 向量空间模型 WEKA Text Mining Text Clustering Vector Space Model WEKA
  • 相关文献

参考文献8

  • 1Ah-Hwee Tan Text Mining: The State of the Art and the Challenges [C]// Proceedings of the PAKDD, 1999.
  • 2Feldman R, Dagan L Knowledge Discovery in Textual Databases (KDT) [C ]//Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), Montreal, Canada,AAAI Press, 1995:112-117.
  • 3Michael W Berry, Malu Castellanos. Survey of Text Mining II: Clustering, Classification, and Retrieval[M]. NewYork:Springer, 2007.
  • 4Marti Hearst and Jan Pedersen, Reexamining the Cluster Hypothesis: Scatter/GatheronRetrievalResttlts [C]//Proceedingsofthe 19thAnnual International ACM/SIGIR Conference, Zurich, August 1996.
  • 5RXu, DWunsch.SurveyofClusteringAlgorithms [J]. IEEE Transactions on Neural Networks, 2005,16(3 ).
  • 6Ayad H, Kamel MS. Topic Discovery from Text Using Aggregation of Different Clustering Methods[M ]. London:Springer, 2002.
  • 7Witten IH, Frank E. Data Mining Practical Machine Learning Tools and Techniques[M]2nd Edition. Morgan Kaufmann,2005.
  • 8Lee JM, Calvo RA. Scalable Document Classification [J]. Intelligent Data Analysis, 2005, 9(4) :65-80.

同被引文献8

引证文献1

二级引证文献35

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部