
基于LDA模型和微博热度的热点挖掘 被引量:62

Hotspot Mining Based on LDA Model and Microblog Heat
摘要 分析传统LDA模型在进行微博热点挖掘时所得概率结果抽象且难以结合实际解释的缺点;考虑到微博本身的数据特点和信息论中信息量的观点,提出微博热度的概念,并将其引入到LDA模型的热点挖掘研究中,构建基于微博热度的LDA模型;通过API采集微博数据上的实验,证明新方法与旧方法具有相同的性能,而且能得到更直观的微博热度表,并得出更具有说服力的挖掘结论。 This paper analyses shortcomings in the traditional LDA (Latent Dirichlet Allocation) model when per- forming microblog hotspot mining, which include that excavated probability results is abstract and is difficult to interpret. Taking into account the characteristics of the microblog and the viewpoint of the information quantity in information theory, it proposes the concept of microblog heat, introduces it into the hotspots mining research of the LDA model, and frams the LDA model based on microblog heat. With experiments on microblog data collected through API, this paper proves that the new method has the same performance compared to the old one, furthermore, it can express a more intuitive table of microblog heatand draw a more convincible conclusion.
作者 唐晓波 向坤
出处 《图书情报工作》 CSSCI 北大核心 2014年第5期58-63,共6页 Library and Information Service
基金 国家自然科学基金项目"社会化媒体集成检索与语义分析方法研究"(项目编号:71273194)研究成果之一
关键词 LDA 微博热度 主题模型 热点挖掘 LDA microblog heat topic model hotspot mining
  • 相关文献


  • 1Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation [ J ]. Journal of Machine Learning Research ,2003,3:993 - 1022.
  • 2蔡淑琴,张静,王旸,马玉涛,林勇.基于中心化的微博热点发现方法[J].管理学报,2012,9(6):874-879. 被引量:17
  • 3Griffiths T, Steyvers M. Finding scientific topics[ J]. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101 ( 1 ) : 5228 - 5235.
  • 4Steyvers M, Griffiths T. Probabilistic topic models [ J ]. Handbook of Latent Semantic Analysis, 2007, 427 (7) : 424 - 440.
  • 5郭红钰.基于信息熵理论的特征权重算法研究[J].计算机工程与应用,2013,49(10):140-146. 被引量:22
  • 6鲁松,李晓黎,白硕,王实.文档中词语权重计算方法的改进[J].中文信息学报,2000,14(6):8-13. 被引量:120
  • 7Yang Yiming, Pedersen J O. A comparative study on feature selec- tion in text categorization[ C ]//Proceeding of the Fourteenth Inter- national Conference on Machine Learning( ICML' 97 ). San Fran- cisco : Morgan Kaufmann Publishers Inc, 1997:412 - 420.
  • 8Wilson A T, Chew P A. Term weighting schemes for latent dirichlet allocation [ C ]//Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the As- sociation for Computational Linguistics. Los Angeles:Association for Computational Linguistics, 2010 : 465 - 473.
  • 9赵迎光,安新颖,李勇,贾晓峰.一种基于生命周期理论的文献热点发现方法——以肿瘤领域为例[J].现代图书情报技术,2012(11):86-91. 被引量:4
  • 10Xu Weili, Feng Shi, Wang Lin, et al. Detecting hot topics in Chi- nese micro-blog streams based on frequent patterns mining [ M ]// Web Information Systems and Mining. Heidelberg : Springer, 2012 : 637-644.


  • 1张玉芳,彭时名,吕佳.基于文本分类TFIDF方法的改进与应用[J].计算机工程,2006,32(19):76-78. 被引量:120
  • 2GOETHALS R G, SNOECK M, LEMAHIEU W, et al. Considering (de) Centralization in a Web Services World[C]//Proceedings of Second International Con- ference on Internet and Web Applications and Serv- ices, Mauritius, 2007:22.
  • 3JORDAN C. Sur les Assemblages de Lignes [J]. Journal fur Die Reine und Angewandte Mathematik, 1869(70):185-190.
  • 4WASSERMAN S, FAUST K. Social Network Anal- ysis [M] Cambridge: Cambridge University Press, 1994.
  • 5TUTZAUER F, ELBIRT B. Entropy-based Centrali zation and Its Sampling Distribution in Directed Corn munication Networks [J].Communication Mono graphs, 2009, 76(3): 351-375.
  • 6CRUCITTI P, LATORA V, PORTA S. Centrality Measures in Spatial Networks of Urban Streets [J]. Physical Review E, 2006, 73(3): 1-5.
  • 7YANG C C, SAGEMAN M. Analysis of Terrorist Social Networks with Fractal Views [J}. Journal of Information Science, 2009, 35(3): 299-320.
  • 8GOLDSZMIDT G, YEMINI Y. Distributed Man- agement by Delegation [C]//Proceedings of 15th IEEE International Conference on Distributed Corn-puting Systems, Vancouver, 1995:333-340.
  • 9CRASWELL N, HAWKING D, THISTLEWAITE P. Merging Results from Isolated Search Engines [C]//Proceedings of the Tenth Australasian Data- base Conference, Auckland, 1999.. 189-200.
  • 10KNORR E, NG R, TUCAKOV V. Distance-based Outliers: Algorithms and Applications [J]. VLDB Journal, 2000, 8(3/4): 237-253.












使用帮助 返回顶部