期刊文献+

基于LSI和自组织神经网络的高效文本聚类方法 被引量:7

An Efficient Method of Text Clustering Based on LSI and SNN
下载PDF
导出
摘要 根据隐含语义索引(LSI)理论和动态自组织映射神经网络理论,提出了一种文本聚类的新方法.应用动态自组织映射神经网络来实现文本聚类,不必预先给定聚类个数,可以在任意合适的位置生成一个新的类,具有聚类灵活和精度高等特点,对于高维的文本特征向量来说,聚类速度很低;该方法应用LSI理论来建立文本集的向量空间模型,在词条的权重中引入了语义关系,消减了原词条矩阵中包含的"噪声"因素,从而更加突出了词和文本之间的语义关系.通过奇异值分解(SVD),有效地降低了向量空间的维数,克服了自组织神经网络的聚类缺陷,提高了文本聚类的精度和速度. This paper presents a new method of text clustering by using the latent semantic index (LSI) and self-organizing neural network (SNN). The dynamic self-organizing neural network is applied to realizing text clustering, which needs not know the number of species in advance, and can create a new species of text in any right position. So it has some merits such as facility and high precision etc. But the speed of clustering is very slow by SNN. The new method establishes vector space model of term weight according to the theory of latent semantic index, and may eliminate disadvantageous factors. The new method decreases the number of vector by singular value decomposition in order to make up the defect of SNN, and enhances largely the speed and precision of text clustering.
出处 《天津大学学报(自然科学与工程技术版)》 EI CAS CSCD 北大核心 2004年第11期1026-1030,共5页 Journal of Tianjin University:Science and Technology
基金 国家自然科学基金资助项目(60275020).
关键词 文本聚类 隐含语义索引 奇异值分解 自组织神经网络 向量空间模型 text clustering latent semantic index singular value decomposition self-organizing neural network vector space model
  • 相关文献

参考文献8

  • 1姜宁,史忠植.文本聚类中的贝叶斯后验模型选择方法[J].计算机研究与发展,2002,39(5):580-587. 被引量:21
  • 2Alahakoon D, Halgamuge S K.Dynamic self organizing maps with controlled growth for knowledge discovery[J].IEEE Trans on Neural Networks,2000,11(3):601-614.
  • 3徐建锁,王正欧,王莉.一种基于自组织神经网络的中文文本聚类新方法[J].情报学报,2003,22(6):676-680. 被引量:11
  • 4Dumais S T,Furnas G W,Landauer T K,et al.Using latent semantic analysis to improve information retrival[A].In:CHI'88 Proceedings[C].New York:ACM Press,1998.
  • 5刘少辉,董明楷,张海俊,李蓉,史忠植.一种基于向量空间模型的多层次文本分类方法[J].中文信息学报,2002,16(3):8-14. 被引量:75
  • 6Deerwester S, Susan S T, Furnas S T,et al.Indexing by latent semantic[J].Journal of American Society for Information Science,1990,41(5):391-407.
  • 7Kolda T G,O'Leary.Large latent semantic indexing via a semi discrete matrix decomposition[R].No.UM-CSD CS-TR-3713,Maryland:University of Maryland,1996.
  • 8Furnas G W, Deerwester S, Dumais S T, et al.Information retrival using singular value decomposition model of latent semantic structure[A].In:Proceedings of SIGIR'88[C].New York:ACM Press,1988.

二级参考文献20

  • 1黄萱菁.大规模中文文本的检索、分类与摘要研究.复旦大学博士学位论文[M].,1998..
  • 2[1]H H Bock.Probabilistic models in cluster analysis.Computational Statistics & Data Analysis,1996,23:5~28
  • 3[2]Chris Fraley,Adrian E Raftery.Model-based clustering,discriminate analysis,and density estimation.Department of Statistics,University of Washington,Tech Rep:380,2000
  • 4[3]Petri T Kontkanen,Petri J Myllymaki,Henry R Tirri.Comparing Bayesian model class selection criteria by discrete finite mixtures.In:D L Dowl,K B Korb,J J Oliver eds.Information,Statistics and Induction in Science (Proc of the ISIS'96 Conf in Melbourne.Australia,1996).Singapore:World Scientific,1996.364~374
  • 5[4]An Introduction to Cluster Analysis for Data Mining.http://www.cs.umn.edu/classes/Spring-2000/csci5980-dm/cluster-survey.pdf
  • 6[5]高等数理统计.超星数字图书馆.http://www.ssreader.com.cn.442~444(Advanced Mathematical Statistics (in Chinese),Superstar Digital Library.http://www.ssreader.com.cn.442~444)
  • 7[6]Jeff A Bilmes.A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models.Computer Science Division Department of Electrical Engineering and Computer Science,U C Berkeley,Tech Rep:TR-97-021,1998
  • 8[7]R E Kass,A E Raftery.Bayesian factors and model uncertainly.Department of Statistics,Carnegie-Mellon University,Tech Rep:571,1993
  • 9[8]I J Good.Weight of evidence:A brief survey.In:J M Bernade ed.Bayesian Statistics 2.New York:Elsevier,1985.249~269
  • 10[9]贝叶斯统计推断.超星数字图书馆.http://www.ssreader.com.cn(Bayesian Inferential Statistics (in Chinese).Superstar Digital Library.http://www.ssreader.com.cn)

共引文献100

同被引文献36

引证文献7

二级引证文献46

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部