期刊文献+

基于高维聚类的探索性文本挖掘算法 被引量:4

Exploratory text mining algorithm based on high-dimensional clustering
下载PDF
导出
摘要 建立了一种基于高维聚类的探索性文本挖掘算法,利用文本挖掘的引导作用实现数据类文本中的数据挖掘。算法只需要少量迭代,就能够从非常大的文本集中产生良好的集群;映射到其他数据与将文本记录到用户组,能进一步提高算法的结果。通过对相关数据的测试以及实验结果的分析,证实了该方法的可行性与有效性。 Because of the unstructured characteristics of free text,text mining becomes an important branch of data mining.In recent years,types of text mining algorithms emerged in large numbers.In this paper,an exploratory text mining algorithm was proposed based on high-dimensional clustering.The algorithm required only a small number of iterations to produce favorable clusters from very large text.Mapping to other recorded data and recording the text to the user group enabled the result of the algorithm be improved further.The feasibility and validity of the proposed method is verified by related data test and the analysis of experimental results.
出处 《计算机应用》 CSCD 北大核心 2013年第4期988-990,1050,共4页 journal of Computer Applications
基金 广西教育厅科研项目基金资助项目(201106LX745 201204LX593)
关键词 自由文本 高维聚类 数据覆盖 文本挖掘 数据挖掘 free text high-dimensional clustering data coverage text mining data mining
  • 相关文献

参考文献14

二级参考文献146

共引文献105

同被引文献51

  • 1么枕生.用于数值分类的聚类分析[J].海洋湖沼通报,1994(2):1-12. 被引量:34
  • 2DEAN J, GHEMAWAT S. MapReduce : simplified data processing on large clusters[J]. Communications of the ACM, 2012, 51 (1) : 107-113.
  • 3ELSAYED T, LIN J, OARD D W. Pairwise document similarity in large collections with MapReduce [ C ]//Proc of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies. [ S. 1. ] : Association for Computational Lin- guistics, 2008: 265-268.
  • 4KANG U, TSOURAKAKIS C E, FALOUTSOS C. PEGASUS: a pe- ta-seale graph mining system implementation and observations [ C ]// Proc of the 9th IEEE International Conference on Data Mining. Wash- ington DC : IEEE Computer Society, 2009 : 229-238.
  • 5EKANAYAKE J, PALLICKARA S, FOX G. Mapreduce for data in- tensive scientific analyses[ C]//Proc of the 4th IEEE International Conference on eScience. 2008: 277-284.
  • 6LIN J, BAHETY A, KONDA S, et al. Low-latency, high-throughput access to static global resources within the Hadoop framework, HCIL- 2009-01 [ R ]. Maryland : University of Maryland, 2013 : 1211-1228.
  • 7BRANTS T, POPAT A C, XU Peng, et al. Large language models in machine translation[ C]//Proc of Joint Conference on Empircal Me- thods in Natural Language Processing. 2007.
  • 8SABATTI C, LANGE K. Genomewide motif identification using a die- tionary model[J]. Proceedings of the IEEE, 2002, 90 ( 11 ) : 1803-1810.
  • 9CROFT W B, METZLER D, STROHMAN T. Search engines: infor- mation retrieval in practice [ M]. Boston: Addison-Wesley, 2010.
  • 10DELWICHE F A. Searching MEDLINE via PubMed [ J]. Clinical Laboratory Science: Journal of the American Society for Medi- cal Technology, 2007, 21 ( 1 ) : 35-41.

引证文献4

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部