期刊文献+

一种基于改进的Newman快速算法的文本聚类方法

An Improved Text Clustering Algorithm of Newman Fast Algorithm
下载PDF
导出
摘要 针对文本聚类计算量大的特点,提出了一种将概念格和Newman快速算法两种理论相结合的聚类方法。首先将文本表示为特征词语集,用统计方法抽取特征向量;同时,用IDF权重计算公式来计算词语的权重,并将词语权值离散化;然后,用形式背景表达关键词,通过相似度公式,计算出形式概念相似度大小;最后,构造Newman网络,根据Newman网络算法规则对待聚类文本进行聚类。实例表明,该算法不仅得到了正确的分类结果,而且大大降低了算法的复杂度,Newman快速算法仅为O((m+n)n)。 According to the feature of great computation for text clustering,a new text clustering method is presented which takes the advantages of concept lattice and Newman fast algorithm.The algorithm firstly expresses the text as feature word set and the technology extracting feature vector by statistical method.Secondly,using the TFIDF weight formula computes the weight of words and making discrete in the words weight.Thirdly,using the form background expresses the keywords ,using similarity formula calculates the size of formal concept similarity.Fourth,building Newman network,clustering the text of cluster by the Newman network algorithm rule.Last but not least,the experiment shows the validity of this method.It is not only take the right sort results,but greatly reduces the complexity of the algorithm,Newman fast algorithm complexity only is O((m+n)n)
出处 《科学技术与工程》 2010年第30期7550-7553,共4页 Science Technology and Engineering
关键词 复杂网络 Newman快速算法 文本聚类 概念格 complex networks Newman fast algorithm text clustering concept lattices
  • 相关文献

参考文献4

二级参考文献14

  • 1Geofrey Z.Liu.语义矢量空间模式(SVSM)及其试验评价——自然语言处理与文献自动标引[J].情报学报,1996,15(6):402-413. 被引量:4
  • 2唐明珠,张远平,杨佳.一种基于概念相似度的文本模糊聚类方法[J].科学技术与工程,2007,7(5):727-730. 被引量:4
  • 3Hamerly G, Elkan C. Alternatives to the k-means algorithm that find better clusterings [ C ]//Proc of the 11 th Intl Conf on Information and Knowledge Management, McLean, Virginia, 2002. [ S.l.] :ACM Press, 2002 : 600 -607.
  • 4Kaufman L, Rousseeuw P J. Finding groups in data : an Introduction to cluster analysis[ M]. New York :John Wiley & Sons,1990.
  • 5Ester M, Kriegel H P, Sander J, et al. A desnsity-based algorithm for discovering clusters in large spatial databases [ C]//Proc 1996 Int Conf Knowledge Discovery and Data Mining ( KDD' 96 ), Portland, Aug 1996:266-231.
  • 6Lin X, Soergel D, Marchionini G. A self-organizing semantic map for information retrieval [ C]//Proc ACM SIGIR Int'l Conf in Information Retrieval ( SIGIR ' 91 ), Chicago, 1991.
  • 7Girvan M, Newman M E J. Community structure in social and biological networks [ C]//Proc Natl Acad Sci ,2001,99:7821-7826,
  • 8Newman M E J,Girvan M. Finding pand evaluating community structure in network[ J]. Phys Rev E ,2004,69.
  • 9Wille R. Restructuring lattice theory:an approach based on hierarchies of concepts [ C]//Rivall Ordered Sets. Dordrecht: Reidel, 1982:445-470
  • 10Santini S,Jain R. Similarity measures [ J ]. IEEE Trans on Pattern Analysis and Machine Intelligence, 1999,21 (9) :871-883.

共引文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部