基于LDA主题模型的用户兴趣发现方法被引量：1

Discoverying User Interest Using Latent Dirchlet Allocation

下载PDF

导出

摘要用户兴趣是对微博用户研究的重要内容,本文使用聚类方法提取用户兴趣。由于微博短文本的特征稀疏和上下文依赖性,传统方法不能取得良好的效果。本文对微博短文本进行基于LDA主题模型的特征拓展处理。LDA主题模型引入隐含主题,通过主题相似性,在一定程度上拓展文本特征,弥补原文本特征稀疏的缺点。并且,在处理多义词时,主题相似性能明显区分不同词义,以解决上下文依赖问题。在此基础上,通过文本聚类方法提取用户兴趣。通过实验表明,在引入LDA模型下,聚类效果和用户兴趣抽取的到明显提升,有效解决的微博用户兴趣发现中文博短文本特征稀疏和上下文依赖问题。 User interest is an important part of the study of micro-blog users,clustering method was used to extract user interest.Due to very sparse features and strong context dependency of the micro-blog's short text, the traditional method can not achieve good results.In this paper,LDA topic model was used on micro-blog's short text to expand fea-tures.LDA topic model introducing the implicit theme, through the topic based similarity, to a certain extent, expanded the text features and maked up for the shortcomings of the original feature.When dealing with the ambiguous word,the TBS performance clearly distinguish words of different meanings,solving the problem of context dependency.On this basis, using the text clustering method to extract user interest.The experiments show that,the proposed method effec-tively solves the problem of sparse features and context dependency.

作者储涛涛

机构地区北京邮电大学计算机学院

出处《软件》 2016年第12期-,共5页 Software

基金国家重点基础研究发展计划(973)(2013CB329606)

关键词用户兴趣短文本 LDA 特征拓展 K-MEANS User interest Short text Feature expanding LDA K-means

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献3

1杨亮,林原,林鸿飞.基于情感分布的微博热点事件发现[J].中文信息学报,2012,26(1):84-90. 被引量：64
2Zhiyuan LIU,Xinxiong CHEN,Maosong SUN.Mining the interests of Chinese microbloggers via keyword extraction[J].Frontiers of Computer Science,2012,6(1):76-87. 被引量：26
3翟延冬,王康平,张东娜,黄岚,周春光.一种基于WordNet的短文本语义相似性算法[J].电子学报,2012,40(3):617-620. 被引量：34

二级参考文献85

1T K Landauer,D Laham,B Rehder,M E Schreiner.How wellcan passage meaning be derived without using word order?Acomparison of latent semantic analysis and humans[A].Proc19th Ann Meeting of the Cognitive Science Soc[C].Mawh-wah,NJ:Lawrence Erlbaum,1997.412-417.
2Jiang,Jay J,David W Conrath.Semantic similarity based oncorpus statistics and lexical taxonomy[A].Proceedings of In-ternational Conference on Research in Computational Linguis-tics[C].Taiwan:IEEE,1997.19-33.
3C Burgess,K Livesay,K Lund.Explorations in context space:words,sentences,discourse[J].Discourse Processes,1998,25(2-3):211-257.
4E K Park,D Y Ra,M G Jang.Techniques for improving webretrieval effectiveness[J].Information Processing and Manage-ment,2005,41(5):1207-1223.
5WordNet Documentation[EB/OL].http://wordnet.princeton.edu/wordnet/documentation/,October 27,2010.
6Li Y,Mclean D,Bandar Z,O’Shea J,Crockett K.Sentencesimilarity based on semantic nets and corpus statistics[J].IEEETransactions on Knowledge and Data Engineering,2006,18(8):1138-1149.
7Madhavan J,Bernstein P,Doan A,Halevy A.Corpus-basedschema matching[A].Proceedings ofthe International Confer-ence on Data Engineering[C].Tokyo:IEEE Computer Soc-iety,2005.57-68.
8G A Miller.WordNet:a lexical database for english[J].Comm ACM,1995,38(11):39-41.
9G Salton,C S Yang.A veetor space model for antomatic in-dexing[J].Communications of the ACM,1975,18:613-620.
10Alexander Budanitsky,Graeme Hirst.Evaluating WordNe-tbased measures of lexical semantic relatedness[J].Computa-tional Linguistics,2006,32(1):13-47.

共引文献121

1张东霞.基于高校学生微博的舆情热点分析与发现[J].东南传播,2013(6):87-89. 被引量：7
2原福永,冯静,符茜茜.微博用户的影响力指数模型[J].现代图书情报技术,2012(6):60-64. 被引量：72
3文坤梅,徐帅,李瑞轩,辜希武,李玉华.微博及中文微博信息处理研究综述[J].中文信息学报,2012,26(6):27-37. 被引量：37
4王君泽,马静,杜洪涛.微博舆情分析平台的框架与支撑技术研究[J].电子政务,2013(1):8-14. 被引量：2
5程锦彬,钱钢.基于用户活跃程度的网络话题热度计算[J].江苏科技信息,2013(2):25-29. 被引量：2
6陈慧.基于用户的的微博话题情感分析[J].中国电子商情（通信市场）,2013(1):23-27.
7甘孟壮,樊兴华.基于聚类和支持向量机相结合的热点发现[J].现代计算机,2013,19(6):9-14. 被引量：1
8Jiayu Tang,Zhiyuan Liu,Maosong Sun,Jiahua Liu.Portraying User Life Status from Microblogging Posts[J].Tsinghua Science and Technology,2013,18(2):182-195. 被引量：1
9吴丹,苏一丹.基于多阈连续条件随机场的标签推荐[J].计算机应用研究,2013,30(5):1312-1315. 被引量：2
10张志飞,苗夺谦,高灿.基于LDA主题模型的短文本分类方法[J].计算机应用,2013,33(6):1587-1590. 被引量：76

同被引文献4

1扈中凯,郑小林,吴亚峰,陈德人.基于用户评论挖掘的产品推荐算法[J].浙江大学学报（工学版）,2013,47(8):1475-1485. 被引量：29
2邱云飞,王琳颍,邵良杉,郭红梅.基于微博短文本的用户兴趣建模方法[J].计算机工程,2014,40(2):275-279. 被引量：21
3雷鸣,朱明.情感分析在电影推荐系统中的应用[J].计算机工程与应用,2016,52(10):59-63. 被引量：15
4仲兆满,管燕,胡云,李存华.基于背景和内容的微博用户兴趣挖掘[J].软件学报,2017,28(2):278-291. 被引量：25

引证文献1

1罗浩,周文静.基于情感词对的高校论坛用户兴趣提取方法[J].东南大学学报（自然科学版）,2017,47(A01):183-186.

软件

2016年第12期

浏览历史

内容加载中请稍等...

基于LDA主题模型的用户兴趣发现方法被引量：1

参考文献3

二级参考文献85

共引文献121

同被引文献4

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于LDA主题模型的用户兴趣发现方法 被引量：1

参考文献3

二级参考文献85

共引文献121

同被引文献4

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于LDA主题模型的用户兴趣发现方法被引量：1