摘要
为适应微博数据的短文本、低词频、缺乏语义表达等特殊性,提高话题发现的准确性,利于用户从大量微博数据中获取有用信息,本文提出一种基于BTM和加权K-Means方法实现微博话题发现。首先,针对微博数据稀疏性的问题,采用BTM模型对微博中的短文本进行建模,获得话题词;然后针对传统K-Means算法本身的缺陷,提出加权K-Means算法实现微博话题发现;最后实验验证本文的方法,实验结果表明,BTM和加权K-Means方法解决了微博数据高维度和稀疏性的问题,提高了热点话题发现的准确性和有效性。
In order to adapt to special features of microblogging data,such as short texts,low word frequency,and lack of semantic expression,improve accuracy of topic discovery,and help users obtain useful information,a method based on BTM and weighting K-Means is proposed to achieve topic discovery.Firstly,faced with the problem of data sparsity,the text model is built based on the BTM model to obtain the topic words.Secondly,aimed at defects of the traditional K-Means algorithm itself,the weighting K-Means algorithm is proposed to obtain microblogging topics.Finally,experiments are conducted to validate the method of this paper.The experimental results show that the BTM and weighting K-Means method can solve problems of high dimensionality and sparsity of microblogging data,and it improves the accuracy and effectiveness of topic discovery.
作者
陈凤
蒙祖强
CHEN Feng;MENG Zuqiang(School of Computer,Electronics and Information,Guangxi University,Nanning Guangxi 530004,China)
出处
《广西师范大学学报(自然科学版)》
CAS
北大核心
2019年第3期71-78,共8页
Journal of Guangxi Normal University:Natural Science Edition
基金
国家自然科学基金(61762009)