摘要
针对微博文本简短、格式内容散乱、信息描述不完全、数据噪声大导致无法高效获取微博主题的问题,提出一种基于LDA改进的SMLDA模型。该模型综合考虑微博作者之间的关系、微博特定主题的标签以及微博文本之间转发关系和背景主题,采用Gibbs抽样算法推导模型参数。在真实新浪微博数据集上进行试验分析,实验结果表明,SMLDA模型与LDA模型比较,前者效率更高,提取结果更准确。
Due to the short message,scattering format and content,incomplete description and data noise,the micro-blog topic can not be obtained efficiently.An improved SMLDA model based on LDA is proposed.The model mainly takes mi-cro-blog authors association,the specific topic tags,the relay document association and the background topic into considera-tion and adopts the Gibbs sampling algorithm to derive parameters.The experimental results on Sina micro-blog data set show that compared with LDA model,the SMLDA model is more effective.
出处
《桂林电子科技大学学报》
2015年第3期241-244,共4页
Journal of Guilin University of Electronic Technology
基金
国家863计划(2012AA011005)