期刊文献+

利用概率主题模型的微博热点话题发现方法 被引量:7

Microblog Hot Topics Discovery Method Based on Probabilistic Topic Model
下载PDF
导出
摘要 微博具有长度短、实时传播、结构复杂以及变形词多等特点,传统的向量空间模型(VSM)文本表示方法和隐含语义分析(LSA)无法很好的对其进行建模.提出了一种基于概率潜在语义分析(pLSA)和K均值聚类(Kmeans)的二阶段聚类算法,此外通过定义微博热度分析和排序,有效地支持微博热点话题发现.实验表明,此方法能有效地进行话题聚类并检测出热点话题. Microblog has the characteristic of short length, complex structure and words deformation. Therefore, traditional vector space model (VSM) and latent semantic analysis (LSA) are not suitable for modeling them. In this paper, a two stage clustering algorithm based on probabilistic latent semantic analysis (pLSA) and Kmeans clustering (Kmeans) is proposed. Besides, this paper also presents the definition of popularity and mechanism of sorting the topics. Experiments show that our method can effectively cluster topics and be applied to microblog hot topic detection.
出处 《计算机系统应用》 2014年第8期163-167,共5页 Computer Systems & Applications
关键词 概率潜在语义分析 话题发现 微博 Kmeans probabilistic latent semantic analysis topic detection microblog Kmeans
  • 相关文献

参考文献3

二级参考文献47

  • 1赵世奇,刘挺,李生.一种基于主题的文本聚类方法[J].中文信息学报,2007,21(2):58-62. 被引量:23
  • 2Kang J H, Lerman K, Plangprasopchok A. Analyzing Microblogs with affinity propagation [C] //Proc of the 1st KDD Workshop on Social Media Analytic. New York: ACM, 2010:67-70.
  • 3Ramage D, Dumais S, Liebling D. Characterizing microblogs with topic models [C] //Proc of Int AAAI Conf on Weblogs and Social Media. Menlo Park, CA: AAAI, 2010:130-137.
  • 4Xu R, Wunsch D. Survey of clustering algorithms [J]. IEEE Trans on Neural Networks, 2005, 16(3): 645-678.
  • 5Deerwester S, Dumais S, Landauer T, et al. Indexing by latent semantic analysis [J]. Journal of the American Society of Information Science, 1990, 41(6): 391-407.
  • 6Landauer T K, Foltz P W, Laham D. Introduction to Latent Semantic Analysis [J]. Discourse Processes, 1998, 25 (2) 259-284.
  • 7Griffiths T, Steyvers M. Probabilistic topic models [G] // Latent Semantic Analysis: A Road to Meaning. Hillsdale, NJ: Laurence Erlbaum, 2006.
  • 8Hofmann T. Probabilistic latent semantic indexing [C] // Proc of the 22nd Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 1999:50-57.
  • 9Salton G, McGill M. Introduction to Modern Information Retrieval [M]. New York: McGraw-Hill, 1983.
  • 10Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation [J]. The Journal of Machine Learning Research, 2003, 3: 993-1022.

共引文献260

同被引文献104

  • 1傅向华,马兆丰,何明,冯博琴.一种个性化的主题提取和层次发现算法[J].西安交通大学学报,2005,39(2):119-122. 被引量:5
  • 2赵旭剑.中文新闻话题动态演化及其关键技术研究[D].合肥:中国科学技术大学,2012.
  • 3洪宇,张宇,刘挺,李生.话题检测与跟踪的评测及研究综述[J].中文信息学报,2007,21(6):71-87. 被引量:153
  • 42015 年第一季度财务报告:微博 Q1 净营收达 9630 万美元[EB/0L].[2015-06-18].http://ww.chinabgao.Com/stat/stats/42373.html.
  • 5Hofmann T, editor Probabilistic latent semantic indexing [C ]. Proceedings of the 22nd annual international ACM SIGIRconference on Research and development in information retrieval, 1999: 50-57.
  • 6Griffiths T, Steyvers M. A probabilistic approach to semantic representation [C]. Proceedings of the 24th annualconference of the cognitive science society, 2002: 381-386.
  • 7Blei DM, Ng AY, Jordan ML Latent dirichlet allocation[J]. the Journal of machine Learning research,2003(3):993-1022.
  • 8Wang X, McCallum A, editors. Topics over time: a non-Markov continuous-time model of topical trends[C]. Proceedingsof the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 2006: 424-433.
  • 9Griffiths TL,Steyvers M. Finding scientific topics [J]. Proceedings of the National Academy of Sciences.2004,101 (suppl1): 5228-5235.
  • 10Hall D,Jurafsky D,Manning CD. Studying the history of ideas using topic models [C].Proceedings of the conference onempirical methods in natural language processing, 2008: 363-371.

引证文献7

二级引证文献66

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部