摘要
数据聚类是基于某种相似性度量在多维数据中识别自然分组或集群的过程。聚类是许多不同学科的基本过程。因此,来自不同领域的研究人员正在积极研究聚类问题。文章首先对代表性的基于划分的聚类方法进行了一个概述,在此基础之上,针对网络舆情热点话题检测,文章使用这几个聚类算法进行对比试验,进而分析出更适用于热点话题检测方面的算法。最后对文章的研究进行总结,归纳出本研究的局限性,并指出改进的方向。
Data clustering is the process of identifying natural groups or clusters in multidimensional data based on a measure of similarity.Clustering is the basic process in many different disciplines. Therefore, researchers from different fields are actively studying the clustering problem. This article first gives an overview of representative partition-based clustering methods. Based on this, this paper uses these clustering algorithms to conduct comparative experiments based on the hot topic detection of Internet public opinions, and then analyzes algorithms that are more suitable for hot topic detection. Finally, the author summarizes the research of this article, sums up the limitations of this study, and points out the direction of improvement.
作者
邓先均
杨雅茜
罗昭
陈旭东
沈小平
DENG Xian-jun;YANG Ya-qian;LUO Zhao;CHEN Xu-dong;SHEN Xiao-ping(Chongqing University of Posts and Telecommunications,Chongqing 400065;ISoftStone Information Technology Group Chengdu Technology Co.,Ltd.,Chengdu Sichuan 610097)Abstract:Data clustering is the process of identifying natural groups or clusters in multidimensional data based on a measure of similarity.)
出处
《数字技术与应用》
2018年第5期146-149,共4页
Digital Technology & Application
关键词
数据聚类
聚类算法
网络舆情
热点话题检测
data clustering
clustering algorithm
Internet public opinion
hot topic detection