期刊文献+

基于BTM和加权K-Means的微博话题发现 被引量:2

Topic Discovery in Microblog Based on BTM and Weighting K-Means
下载PDF
导出
摘要 为适应微博数据的短文本、低词频、缺乏语义表达等特殊性,提高话题发现的准确性,利于用户从大量微博数据中获取有用信息,本文提出一种基于BTM和加权K-Means方法实现微博话题发现。首先,针对微博数据稀疏性的问题,采用BTM模型对微博中的短文本进行建模,获得话题词;然后针对传统K-Means算法本身的缺陷,提出加权K-Means算法实现微博话题发现;最后实验验证本文的方法,实验结果表明,BTM和加权K-Means方法解决了微博数据高维度和稀疏性的问题,提高了热点话题发现的准确性和有效性。 In order to adapt to special features of microblogging data,such as short texts,low word frequency,and lack of semantic expression,improve accuracy of topic discovery,and help users obtain useful information,a method based on BTM and weighting K-Means is proposed to achieve topic discovery.Firstly,faced with the problem of data sparsity,the text model is built based on the BTM model to obtain the topic words.Secondly,aimed at defects of the traditional K-Means algorithm itself,the weighting K-Means algorithm is proposed to obtain microblogging topics.Finally,experiments are conducted to validate the method of this paper.The experimental results show that the BTM and weighting K-Means method can solve problems of high dimensionality and sparsity of microblogging data,and it improves the accuracy and effectiveness of topic discovery.
作者 陈凤 蒙祖强 CHEN Feng;MENG Zuqiang(School of Computer,Electronics and Information,Guangxi University,Nanning Guangxi 530004,China)
出处 《广西师范大学学报(自然科学版)》 CAS 北大核心 2019年第3期71-78,共8页 Journal of Guangxi Normal University:Natural Science Edition
基金 国家自然科学基金(61762009)
关键词 BTM模型 加权K-Means 微博数据 话题发现 biterm topic model(BTM) weighting K-Means microblogging data topic discovery
  • 相关文献

参考文献8

二级参考文献98

  • 1袁方,周志勇,宋鑫.初始聚类中心优化的k-means算法[J].计算机工程,2007,33(3):65-66. 被引量:154
  • 2ZHAO W X, HE J, YAN H F, et al. Comparing Twitter and traditional media using topic models[J]. Advances in Information Retrieval, Proceedings. 2011, 6611:338- 349.
  • 3NOORDHUIS P, HEIJKOOP M, LAZOVIK A. Mining Twitter in the cloud: a case study[C]. Cloud Computing (CLOUD), 2010. IEEE 3rd International Conference. 2010 July, 107-114.
  • 4KANG J H, LERMAN K, PLANGPRASOPCHOK A. Analyzing mieroblogs with affinity propagation [C]//Proc of the 1st KDD Workshop on Social Media Analytic. New York: ACM, 2010: 67-70.
  • 5BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3:993-1022.
  • 6RAMAGE D, DUMAIS S, LIEBLING D. Characterizing microblogs with topic models[C]. ICWSM, 2010..130- 137.
  • 7ZHANG H P, YU H K, XIONG D Y, et al. HHMM-based chinese lexical analyzer ICTCLAS[C]//Proc of the 2nd SigHan Workshop. 2003:184-187.
  • 8DEERWESTER S, DUMAIS S, LANDAUER T. Indexing by latent semantic analysis[J]. Journal of the Ameri- can Society of Information Science. 1990, 41 (6) :391-407.
  • 9HOFMANN T. Probabilistic latent semantic indexing[C]//Proc of the 22nd Annual Int ACM SIGIR Conf on Re- search and Development in Information Retrieval. New York: ACM, 1999:50 -57.
  • 10BLEI D M. Probabilistie topic models[C]. Communications of the ACM. 2012, 4:77-84.

共引文献177

同被引文献29

引证文献2

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部