期刊文献+

基于聚类的热词发现与关联分析 被引量:2

Hot-Word Detection and Relations Analysis Based on Document Clustering
下载PDF
导出
摘要 提出一种将话题聚类算法应用到计算热词关联度上的方法。在热词发现阶段,通过对新闻文本的特征提取,构建向量空间模型,采用初始聚类中心优化的K-means算法,获取热点簇;在关联分析阶段,先通过热点簇计算词类别距离,再和新闻同现率,热词同现率加权累加,得到热词关联度。该方法已成功应用到南华大学舆情监测系统中,并在实际运行中获得较好的效果。 Proposes a method to discover hot-word relations based on topic clustering.For word discovering,vector space mode is built by extracting document features from news text,and the hot-spot cluster is achieved by K-means algorithm with ameliorated initial center.Up to the hot-word association,hot words relations are analyzed according to the weighted sum of three factors,which include the word category distance computed by the hot-spot cluster,the news co-occurrence rate and the hot words co-occurrence rate.This approach has been successfully applied to Public Opinion Monitoring System of University of South China and it obtains good results in practical operation.
出处 《现代计算机(中旬刊)》 2016年第5期56-59,68,共5页 Modern Computer
基金 湖南省哲学社会科学基金(No.14YBA335)
关键词 K-MEANS SVM 热词 词群关系 K-means Algorithm SVM Hot Words Words Relationship
  • 相关文献

参考文献5

二级参考文献29

  • 1贾自艳,何清,张海俊,李嘉佑,史忠植.一种基于动态进化模型的事件探测和追踪算法[J].计算机研究与发展,2004,41(7):1273-1280. 被引量:58
  • 2中国互联网络信息中心.第22次中国互联网络发展状况统计报告[EB/OL].http://www.cnnic.net.cn/uploadfiles/pdf/2008/7/23/170516.pdf2008-07-23.
  • 3ICTCLAS简介[EB/OL].[2008-12-01].http://ictclas.org/sub_1_1.html.
  • 4L. R. Rabiner (1989) A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition[C]//Proceedings of IEEE. 77(2):257-286.
  • 5Satoshi S. , Nagao M. Toward memory-based translation[C]//Proceedings of the 13th International Confer ence on Computational Linguistics (COLING-90). Hel sinki, Finland, 1990: 247-252.
  • 6吕学强.面向机器翻译的E-Chunk获取与应用研究[D].博士毕业论文.东北大学.2005:27-52.
  • 7Nagao M. , Mori S. A new method of n-gram statistics for large number of n and automatic extraction of words and phrases from large text data of Japanese [C]//Proceedings from the 15th International Conference on Computational Linguistics, Kyoto 1994 : 611-615.
  • 8刘群 李素建.基于《知网》的词汇语义相似度计算[A]..第三届汉语词汇语义学研讨会[c].台北,2002..
  • 9MacQueen J.Some Methods for Classification and Analysis of Multivariate Observations[C]//Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability,1967.
  • 10Wang Wei,Yang Jiong,Muntz R.STING:A Statistical Information Grid Approach to Spatial Data Mining[C]//Proc.of the 23rd International Conference on Very Large Data Bases,1997.

共引文献244

同被引文献31

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部