期刊文献+

Web文本聚类算法WTCA的研究与实现 被引量:1

Research and implementation of Web text clustering algorithm WTCA
下载PDF
导出
摘要 提出了一种新的Web文本聚类算法WTCA——基于自组织特征映射神经网络(SOM)的聚类算法。该算法分为训练SOM网络及聚类分析两个阶段,具有自稳定性,无须外界给出评价函数;能够识别概念空间中最有意义的特征,抗噪音能力强。该算法应用到现代远程教育网,可以对各类远程教育站点上收集的文本资料信息自动进行聚类分析;从海量Web文本信息源中快速有效地获取重要的知识。 In this paper,we present a new algorithm of Web text clustering mining WTCA.This algorithm includes the training stage and the clustering stage of SOM network.It can distinguish the most meaningful features from the Concept Space without the evaluation function.The algorithm has been applied to the Modern Long-distance Education Net.It can automatically congregate the text information of education field,which is collected from education sites and help people to browse the important information quickly by information navigation mechanism and acquire useful knowledge.
作者 郑煜 钱榕
出处 《计算机工程与应用》 CSCD 北大核心 2007年第4期170-172,共3页 Computer Engineering and Applications
基金 北京市自然科学基金(the Natural Science Foundation of Beijing City of China under Grant No.4022008)。
关键词 WEB文本挖掘 文本聚类 非结构化数据挖掘结构模型 自组织特征映射 Web text mining text clustering nonstruetural data mining Self-Organization Feature Mapping(SOM)
  • 相关文献

参考文献11

二级参考文献32

  • 1[1]H H Bock.Probabilistic models in cluster analysis.Computational Statistics & Data Analysis,1996,23:5~28
  • 2[2]Chris Fraley,Adrian E Raftery.Model-based clustering,discriminate analysis,and density estimation.Department of Statistics,University of Washington,Tech Rep:380,2000
  • 3[3]Petri T Kontkanen,Petri J Myllymaki,Henry R Tirri.Comparing Bayesian model class selection criteria by discrete finite mixtures.In:D L Dowl,K B Korb,J J Oliver eds.Information,Statistics and Induction in Science (Proc of the ISIS'96 Conf in Melbourne.Australia,1996).Singapore:World Scientific,1996.364~374
  • 4[4]An Introduction to Cluster Analysis for Data Mining.http://www.cs.umn.edu/classes/Spring-2000/csci5980-dm/cluster-survey.pdf
  • 5[5]高等数理统计.超星数字图书馆.http://www.ssreader.com.cn.442~444(Advanced Mathematical Statistics (in Chinese),Superstar Digital Library.http://www.ssreader.com.cn.442~444)
  • 6[6]Jeff A Bilmes.A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models.Computer Science Division Department of Electrical Engineering and Computer Science,U C Berkeley,Tech Rep:TR-97-021,1998
  • 7[7]R E Kass,A E Raftery.Bayesian factors and model uncertainly.Department of Statistics,Carnegie-Mellon University,Tech Rep:571,1993
  • 8[8]I J Good.Weight of evidence:A brief survey.In:J M Bernade ed.Bayesian Statistics 2.New York:Elsevier,1985.249~269
  • 9[9]贝叶斯统计推断.超星数字图书馆.http://www.ssreader.com.cn(Bayesian Inferential Statistics (in Chinese).Superstar Digital Library.http://www.ssreader.com.cn)
  • 10[10]P Cheeseman,J Stutz.Bayesian Classification (AutoClass):Theory and results.In:U M Tayyad ed.Knowledge Discovery in Data Bases II.AAAI Press /The MIT Press,1995.153~180

共引文献418

同被引文献6

  • 1朱克斌,唐菁,杨炳儒.Web文本挖掘系统及聚类分析算法[J].计算机工程,2004,30(13):138-139. 被引量:7
  • 2Klose A,Nurnberger A,Kruse R,et al.Interaetive text retrieval based on document similarities[J].Phys Cbem Earth,2000,25 (8):649-654.
  • 3Dunlavy D M,Oleary D P,Conroy J M,et al.QCS:A system for,clustering and summarizing documents[J].Informatian Processing and Management,2007.doi:10.101 6/j.ipm.2007.01.003.
  • 4杨冬青.业务建模与数据挖掘[M].北京:机械工业出版社,2005.
  • 5范明,范宏建.数据挖掘导论[M].北京:人民邮电出版社,2006.
  • 6唐春生,金以慧.一种大规模的递增聚类算法及其在文档聚类中的应用[J].计算机工程与应用,2002,38(11):187-190. 被引量:2

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部