期刊文献+

面向网络舆情的文本知识发现算法对比研究 被引量:1

Comparative research on text knowledge discovery for network public opinion
原文传递
导出
摘要 针对网络舆情分析领域,研究了系统聚类、String Kernels、K最近邻算法(K-nearest neighbor,KNN)、SVM(support vector machine)算法以及主题模型5种聚类算法。以网络舆情数据为对象集,以R语言环境为实验工具,比较了这5种算法的优势与劣势,同时进行了仿真实验。实验结果表明,主题模型相对于其他算法在文本聚类方面具有更好的适用性,其中,主题模型中的CTM(correlated topic model)方法更适合于类别关系的探索与发现,而Gibbs抽样方法则在文本聚类上的表现优于CTM方法。 According to the field of network public opinion analysis,five clustering algorithms:system clustering, string kernels,K nearest neighbor algorithm,support vector machine algorithm and topic models were studied.A com-prehensive comparative research of these five algorithms was conducted by using network public opinion data as data set and R language environment as experimental tool.At the same time,simulation experiments were carried out to com-pare these five algorithms’strengths and weaknesses.Experimental results show that"topic model"has better applica-bility than other algorithms in terms of text clustering.After further experiments we also found in topic models,CTM (Correlated Topic Model)method is more suitable for the exploration and discovery of class relations while Gibbs sam-pling method on the performance of text clustering method is better than the CTM method.
出处 《山东大学学报(理学版)》 CAS CSCD 北大核心 2014年第9期62-68,82,共8页 Journal of Shandong University(Natural Science)
基金 北京市自然科学基金资助项目(9142002) 北京市教育委员会科技计划面上项目(KM201410028020)
关键词 主题模型 文本知识发现 文本聚类 网络舆情 topic model text knowledge discovery text clustering network public opinion
  • 相关文献

参考文献11

  • 1胡雷芳.五种常用系统聚类分析方法及其比较[J].浙江统计,2007(4):11-13. 被引量:75
  • 2Huma Lodhi, Craig Saunders, John Shawe-Taylo, et al. Text classification using String Kernels [J]. Journal of Machine Learning Research, 2002, 2: 419-444.
  • 3LEI Zhen, JIANG Yanjie, ZHAO Peng, et al. News event tracking using an improved hybrid of KNN and SVM [J]. Communication and Networking, 2009, 56: 431-438.
  • 4Gregor Heinrich. Parameter estimation for text analysis[R]. Darmstadt:Fraunhofer IGD, 2004.
  • 5常州大学.基于文本语义相关的网络舆情信息分析方法:中国,CN103544255 A[P]. 2014-01-29.
  • 6李岩,娄云.文本聚类算法在舆情监控中的应用分析[J].电子设计工程,2013,21(1):70-73. 被引量:4
  • 7杨震,段立娟,赖英旭.基于字符串相似性聚类的网络短文本舆情热点发现技术[J].北京工业大学学报,2010,36(5):669-673. 被引量:25
  • 8李岩,韩斌,赵剑.基于短文本及情感分析的微博舆情分析[J].计算机应用与软件,2013,30(12):240-243. 被引量:22
  • 9WANG Xing, XIONG Fei, LIU Yun. Research on micro-blog information perception and mining platform[J]. Advanced Technologies, Embedded and Multimedia for Human-centric Computing, 2014, 260:753-761.
  • 10Frida Borng, Rainer Eising, Heike Klüver, et al. Identifying frames: a comparison of research methods[J].Interest Groups and Advocacy, 2014, 3:188-201.opinion information analysis method: China, CN103544255 A[P]. 2014-01-29.

二级参考文献34

  • 1车万翔,刘挺,秦兵,李生.基于改进编辑距离的中文相似句子检索[J].高技术通讯,2004,14(7):15-19. 被引量:63
  • 2李凡,林爱武,陈国社.一种基于VSM文本分类系统的设计与实现[J].华中科技大学学报(自然科学版),2005,33(3):53-55. 被引量:19
  • 3赵华,赵铁军,张姝,王浩畅.基于内容分析的话题检测研究[J].哈尔滨工业大学学报,2006,38(10):1740-1743. 被引量:20
  • 4徐晓日.网络舆情事件的应急处理研究[J].华北电力大学学报(社会科学版),2007(1):89-93. 被引量:141
  • 5中国信息产业商会信息安全产业分会.中国信息安全产业发展白皮书(2005-2010)[EB/OL].[2005-3-11].http://www.itsec.gov.cn/webportal/document/baipishu.doc.
  • 6SCOTT J. Social network analysis: a handbook[ M]. 2nd Edition. London: Sage, 2000: 123-145.
  • 7BOUGUESSA M, WANG S, SUN H. An objective approach to cluster validation[ J]. Pattern Recognition Letters, 2006, 27 (13) : 1419-1430.
  • 8Likas A, Vlassis N, Verbeek J. The global k-means clustering algorithm. Pattern Recognition, 2003,36(2):451.
  • 9MacQueen J. Some methods for classification and analysis of multivariate observations. Proc. of the 5th Berkeley Symp. on Mathematics Statistic Problem, 1967:281 -297.
  • 10Dhillon IS, Modha DS.Concept decompositions for large sparse text data using clustering. Machine Learning, 2001, 42(1):143- 175.

共引文献128

同被引文献8

引证文献1

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部