期刊文献+

文本聚类算法在舆情监控中的应用分析 被引量:4

Applied research of text clustering algorithm in network monitoring public opinion
下载PDF
导出
摘要 为满足网络舆情监控系统中话题发现的需要,并克服经典single-pass算法在处理网络文本聚类时受输入顺序及精度较低的不足,本文对single-pass聚类算法进行改进,通过采用average-1ink策略及引入"代"的思想分批聚类,在借鉴single-pass聚类方法简单高效的同时,又克服了其缺点,兼顾了网络话题发现的实时性和准确性,通过实验分析改进后的single-pass算法比single-pass算法在漏检率、误检率和耗时方面都有很大改观。实验证明改进的算法在提高话题发现准确度上的有效性和实用性。 To meet the needs of topic detection for monitoring the public opinion on internet,this paper proposed an incremental clustering algorithm to improve the two main disadvantages of single-pass algorithm,that was,being easily effected by the order of inputs and low precision. In this paper, the single-pass clustering algorithm the average-link strategy and the introduction of the idea of "generation" in batches clustering inherited the simple principle from single-pass to ensure clustering internet texts in real time and overcame. Tough the experimental analysis of the improved single-pass algorithm than the single-pass algorithm in fhe miss rate, error rate and time consuming aspects has greatly improved. The experimental results show the improved algorithm in improving the topic detection accuracy on the validity and practicality.
作者 李岩 娄云
出处 《电子设计工程》 2013年第1期70-73,共4页 Electronic Design Engineering
关键词 网络舆情 话题发现 文本聚类 single-pass the cyberspace public opinion topic detection text clustering single-pass
  • 相关文献

参考文献10

二级参考文献52

共引文献256

同被引文献23

  • 1刘毅.略论网络舆情的概念、特点、表达与传播[J].理论界,2007(1):11-12. 被引量:311
  • 2胡雷芳.五种常用系统聚类分析方法及其比较[J].浙江统计,2007(4):11-13. 被引量:75
  • 3.中国互联网络发展状况统计报告[R].,2004.7..
  • 4洪宇,张宇,刘挺,李生.话题检测与跟踪的评测及研究综述[J].中文信息学报,2007,21(6):71-87. 被引量:153
  • 5Young-Woo Seo, Katia Sycara. Text Clustering for Topic Detection[D]. Pittsburgh: Carnegie Mellon Uni- versity, 2004 : 5-6.
  • 6Sehultz J, Liberrnan M. Topic. Deteetion and Tracking Using IDF-weighted Cosine Coeffieient[Z]. USA: In Praceeding of the DARPA Broadeast News workshop, Hemdon, 1999 : 189-192.
  • 7Yu M Q, Luo W H, Zhou Z T, et al. ICT's Approaches to HTD and Traeking at TDT 2004[C]//Proeeeding of Topic Detection and Trackong Workshop, 2004: 402-408.
  • 8Oinglin Guo. The Similarity Computing of Documents Based on VSM[C]//Proceedings of Computer Software and Applications, 2008. COMPSAC'08. 32nd Annual IEEE International. Sep. , 2008 : 585-586.
  • 9Huma Lodhi, Craig Saunders, John Shawe-Taylo, et al. Text classification using String Kernels [J]. Journal of Machine Learning Research, 2002, 2: 419-444.
  • 10LEI Zhen, JIANG Yanjie, ZHAO Peng, et al. News event tracking using an improved hybrid of KNN and SVM [J]. Communication and Networking, 2009, 56: 431-438.

引证文献4

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部