期刊文献+

结合互信息和主题模型的微博话题发现方法 被引量:5

Microblog hot topic detection based on positive point mutual information and probabilistic topic model
下载PDF
导出
摘要 为了解决短文本信息流的特征稀疏性对热点话题发现带来的挑战,提出了结合词语互信息和概率主题模型的微博热点话题发现方法。通过建立词共现矩阵并应用对称非负矩阵分解算法获取词项-主题矩阵,再利用概率潜在语义分析模型进行主题发现,最终通过定义微博热度分析和排序,有效地支持微博热点话题发现。实验表明,此方法能有效地进行话题聚类并检测出热点话题。 In order to face the challenges of feature sparsely of short text messages for microblog hot topic detection, this paper proposes a hot topic detection method based on the combination of term mutual information and probabilistic topic model. Symmetric Nonnegative Matrix Factorization(s NMF)is performed on word co-occurrence with word mutual information and the matrix of term-topic matrix is thereafter inferred. Probabilistic Latent Semantic Analysis(p LSA)model is then adopted to model the topic-microblog. The hotness of topic is analyzed and sorted. Experiments show that this method can effectively cluster and detect the hot topics.
出处 《计算机工程与应用》 CSCD 北大核心 2016年第6期61-66,共6页 Computer Engineering and Applications
基金 国家自然科学基金(No.61163039 No.61363058) 甘肃省教育厅项目(No.2013A-016)
关键词 词共现矩阵 对称非负矩阵分解 概率潜在语义分析 微博热点话题发现 term co-occurrence matrix symmetrical nonnegative matrix factorization probabilistic latent semantic analysis micro-blog hot topic detection
  • 相关文献

参考文献14

  • 1刘丽珍,宋瀚涛.文本分类中的特征选取[J].计算机工程,2004,30(4):14-15. 被引量:40
  • 2Sahami M,Heilman T D.A Web-based kernel function for measuring the similarity of short text snippets[C]//Proceedings of the 15th Conference on World Wide Web.New York:ACM,2006:377-386.
  • 3Yih W,Meek C.Improving similarity measures for short segments of texts[C]//Proceedings of the 22nd Conference on Artificial Intelligence.Menlo Park:AAAI Press,2007:1489-1494.
  • 4翟延冬,王康平,张东娜,黄岚,周春光.一种基于WordNet的短文本语义相似性算法[J].电子学报,2012,40(3):617-620. 被引量:34
  • 5Banerjee S,Ramamathan K,Gupta A.Clustering short texts using Wikipedia[C]//Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2007:787-788.
  • 6马慧芳,王博.基于增量主题模型的微博在线事件分析[J].计算机工程,2013,39(3):191-196. 被引量:5
  • 7Sun Y X,Ma H F.An efficient microblog hot topic detection algorithm based on two stage clustering[C]//Proceedings of the 8th International Conference on Intelligent Information Processing,Hangzhou,2014:90-95.
  • 8Ma H F,Sun Y X.Microblog hot topic detection based on topic model using term correlation matrix[C]//Proceedings of the 17th International Conference on Machine Learning and Cybernetics,Lanzhou,2014:25-29.
  • 9Ma H F,Wang B.A novel online event analysis framework for micro-blog based on incremental topic modeling[C]//13th ACIS International Conference on Software Engineering,Artificial Intelligence,Networking and Parallel/Distributed Computing,2012:73-76.
  • 10Yan Xiaohui,Guo Jiafeng,Liu Shenghua,et al.Learning topics in short texts by non-negative matrix factorization on term correlation matrix[C]//Proceedings of the 2013SIAM International Conference on Data Mining.Austin:SIAM,2013.

二级参考文献27

  • 1T K Landauer,D Laham,B Rehder,M E Schreiner.How wellcan passage meaning be derived without using word order?Acomparison of latent semantic analysis and humans[A].Proc19th Ann Meeting of the Cognitive Science Soc[C].Mawh-wah,NJ:Lawrence Erlbaum,1997.412-417.
  • 2Jiang,Jay J,David W Conrath.Semantic similarity based oncorpus statistics and lexical taxonomy[A].Proceedings of In-ternational Conference on Research in Computational Linguis-tics[C].Taiwan:IEEE,1997.19-33.
  • 3C Burgess,K Livesay,K Lund.Explorations in context space:words,sentences,discourse[J].Discourse Processes,1998,25(2-3):211-257.
  • 4E K Park,D Y Ra,M G Jang.Techniques for improving webretrieval effectiveness[J].Information Processing and Manage-ment,2005,41(5):1207-1223.
  • 5WordNet Documentation[EB/OL].http://wordnet.princeton.edu/wordnet/documentation/,October 27,2010.
  • 6Li Y,Mclean D,Bandar Z,O’Shea J,Crockett K.Sentencesimilarity based on semantic nets and corpus statistics[J].IEEETransactions on Knowledge and Data Engineering,2006,18(8):1138-1149.
  • 7Madhavan J,Bernstein P,Doan A,Halevy A.Corpus-basedschema matching[A].Proceedings ofthe International Confer-ence on Data Engineering[C].Tokyo:IEEE Computer Soc-iety,2005.57-68.
  • 8G A Miller.WordNet:a lexical database for english[J].Comm ACM,1995,38(11):39-41.
  • 9G Salton,C S Yang.A veetor space model for antomatic in-dexing[J].Communications of the ACM,1975,18:613-620.
  • 10Alexander Budanitsky,Graeme Hirst.Evaluating WordNe-tbased measures of lexical semantic relatedness[J].Computa-tional Linguistics,2006,32(1):13-47.

共引文献76

同被引文献76

引证文献5

二级引证文献24

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部