摘要
提出一种将话题聚类算法应用到计算热词关联度上的方法。在热词发现阶段,通过对新闻文本的特征提取,构建向量空间模型,采用初始聚类中心优化的K-means算法,获取热点簇;在关联分析阶段,先通过热点簇计算词类别距离,再和新闻同现率,热词同现率加权累加,得到热词关联度。该方法已成功应用到南华大学舆情监测系统中,并在实际运行中获得较好的效果。
Proposes a method to discover hot-word relations based on topic clustering.For word discovering,vector space mode is built by extracting document features from news text,and the hot-spot cluster is achieved by K-means algorithm with ameliorated initial center.Up to the hot-word association,hot words relations are analyzed according to the weighted sum of three factors,which include the word category distance computed by the hot-spot cluster,the news co-occurrence rate and the hot words co-occurrence rate.This approach has been successfully applied to Public Opinion Monitoring System of University of South China and it obtains good results in practical operation.
基金
湖南省哲学社会科学基金(No.14YBA335)