期刊文献+

改进LDA模型的短文本聚类方法 被引量:4

Improved Short Text Clustering Method of LDA Model
下载PDF
导出
摘要 在短文本聚类模型中,传统LDA模型没有考虑文本与主题之间的联系。提出一种具有判别学习能力的LDA模型,在LDA-λ模型中将二项分布引入LDA基础模型,增加词项的判别能力。对模型进行理论分析与对比试验,结果表明,基于改进的LDA模型精确度(ACC)、归一化互信息(NMI)和成对F测度值(PWF)比较高,分别达到0.7384、0.8191、0.6941,比传统的LDA模型分别提高1.62%、2.51%、1.2%,比VSM模型分别提高2.83%、10.99%、1.89%,基于改进的LDA模型在聚类问题处理上要优于LDA模型和VSM。 In the short text clustering model,the traditional LDA(Latent Dirichletalloc Allocation)model does not consider the connection between the text and the topic.In order to consider the connection between the text and the topic,we propose an LDA model with discriminative learning ability.In the LDA-λmodel,we introduce the binomial distribution into the LDA basic model to increase the learning of the discriminative ability of terms,And then conduct theoretical analysis and comparative experiments.The comparative experiment results show that the accuracy(ACC),normalized mutual information(NMI)and paired F measurement(PWF)of the improved LDA model are relatively high,reaching 0.7384,0.8191,and 0.6941 respectively.Compared with the traditional LDA model,the improvement is 1.62%,2.51%and 1.2%;compared with VSM model,the improvement is 2.83%,10.99%and 1.89%respectively.Therefore,it can be obtained from experiments that the improved LDA model is superior to the LDA model and VSM in processing clustering problems.
作者 孙红 俞卫国 SUN Hong;YU Wei-guo(School of Optical-electrical and Computer Engineering,University of Shanghai for Science and Technology;Shanghai Key Laboratory of Modern Optical System,Shanghai 200093,China)
出处 《软件导刊》 2021年第9期1-6,共6页 Software Guide
基金 国家自然科学基金项目(61472256,61170277,61703277) 沪江基金项目(C14002)。
关键词 主题模型 改进LDA模型 文本聚类 概率生成模型 短文本 主题挖掘 topic model improved LDA model text clustering probability generation model short text topic mining
  • 相关文献

参考文献7

二级参考文献80

  • 1Hsuand J C T,IEEE Trans Image Processing,1999年,8卷,1期,58页
  • 2Swanson M D,IEEE J Select Areas Commun,1998年,16卷,4期,540页
  • 3Hsu C T,IEEE Trans Circuit Syst II Analog Digital Signal Processing,1998年,45卷,8期,1097页
  • 4Swanson M D,Proc of ICIP'96,1996年,3卷,211页
  • 5Koch E,Proc Int Conference on Digital Media and Electronic Publishing,1994年,6页
  • 6BENGIO Y, DELALLEAU O. On the expressive power of deep archi- tectures[ C ]//Proc of the 14th International Conference on Discovery Science. Berlin : Springer-Verlag, 2011 : 18 - 36.
  • 7BENGIO Y. Leaming deep architectures for AI[ J]. Foundations and Trends in Machine Learning ,2009,2 ( 1 ) : 1-127.
  • 8HINTON G,OSINDERO S,TEH Y. A fast learning algorithm for deep belief nets [ J ]. Neural Computation ,2006,18 (7) : 1527-1554.
  • 9BENGIO Y, LAMBLIN P, POPOVICI D, et al. Greedy layer-wise training of deep networks [ C ]//Proc of the 12th Annual Conference on Neural Information Processing System. 2006:153-160.
  • 10LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning ap- plied to document recognition[ J]. Proceedings of the iEEE, 1998, 86( 11 ) :2278-2324.

共引文献1016

同被引文献32

引证文献4

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部