期刊文献+

基于LDA主题模型的用户兴趣发现方法 被引量:1

Discoverying User Interest Using Latent Dirchlet Allocation
下载PDF
导出
摘要 用户兴趣是对微博用户研究的重要内容,本文使用聚类方法提取用户兴趣。由于微博短文本的特征稀疏和上下文依赖性,传统方法不能取得良好的效果。本文对微博短文本进行基于LDA主题模型的特征拓展处理。LDA主题模型引入隐含主题,通过主题相似性,在一定程度上拓展文本特征,弥补原文本特征稀疏的缺点。并且,在处理多义词时,主题相似性能明显区分不同词义,以解决上下文依赖问题。在此基础上,通过文本聚类方法提取用户兴趣。通过实验表明,在引入LDA模型下,聚类效果和用户兴趣抽取的到明显提升,有效解决的微博用户兴趣发现中文博短文本特征稀疏和上下文依赖问题。 User interest is an important part of the study of micro-blog users,clustering method was used to extract user interest.Due to very sparse features and strong context dependency of the micro-blog's short text, the traditional method can not achieve good results.In this paper,LDA topic model was used on micro-blog's short text to expand fea-tures.LDA topic model introducing the implicit theme, through the topic based similarity, to a certain extent, expanded the text features and maked up for the shortcomings of the original feature.When dealing with the ambiguous word,the TBS performance clearly distinguish words of different meanings,solving the problem of context dependency.On this basis, using the text clustering method to extract user interest.The experiments show that,the proposed method effec-tively solves the problem of sparse features and context dependency.
作者 储涛涛
出处 《软件》 2016年第12期-,共5页 Software
基金 国家重点基础研究发展计划(973)(2013CB329606)
关键词 用户兴趣 短文本 LDA 特征拓展 K-MEANS User interest Short text Feature expanding LDA K-means
  • 相关文献

参考文献3

二级参考文献85

  • 1T K Landauer,D Laham,B Rehder,M E Schreiner.How wellcan passage meaning be derived without using word order?Acomparison of latent semantic analysis and humans[A].Proc19th Ann Meeting of the Cognitive Science Soc[C].Mawh-wah,NJ:Lawrence Erlbaum,1997.412-417.
  • 2Jiang,Jay J,David W Conrath.Semantic similarity based oncorpus statistics and lexical taxonomy[A].Proceedings of In-ternational Conference on Research in Computational Linguis-tics[C].Taiwan:IEEE,1997.19-33.
  • 3C Burgess,K Livesay,K Lund.Explorations in context space:words,sentences,discourse[J].Discourse Processes,1998,25(2-3):211-257.
  • 4E K Park,D Y Ra,M G Jang.Techniques for improving webretrieval effectiveness[J].Information Processing and Manage-ment,2005,41(5):1207-1223.
  • 5WordNet Documentation[EB/OL].http://wordnet.princeton.edu/wordnet/documentation/,October 27,2010.
  • 6Li Y,Mclean D,Bandar Z,O’Shea J,Crockett K.Sentencesimilarity based on semantic nets and corpus statistics[J].IEEETransactions on Knowledge and Data Engineering,2006,18(8):1138-1149.
  • 7Madhavan J,Bernstein P,Doan A,Halevy A.Corpus-basedschema matching[A].Proceedings ofthe International Confer-ence on Data Engineering[C].Tokyo:IEEE Computer Soc-iety,2005.57-68.
  • 8G A Miller.WordNet:a lexical database for english[J].Comm ACM,1995,38(11):39-41.
  • 9G Salton,C S Yang.A veetor space model for antomatic in-dexing[J].Communications of the ACM,1975,18:613-620.
  • 10Alexander Budanitsky,Graeme Hirst.Evaluating WordNe-tbased measures of lexical semantic relatedness[J].Computa-tional Linguistics,2006,32(1):13-47.

共引文献121

同被引文献4

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部