摘要
在短文本聚类模型中,传统LDA模型没有考虑文本与主题之间的联系。提出一种具有判别学习能力的LDA模型,在LDA-λ模型中将二项分布引入LDA基础模型,增加词项的判别能力。对模型进行理论分析与对比试验,结果表明,基于改进的LDA模型精确度(ACC)、归一化互信息(NMI)和成对F测度值(PWF)比较高,分别达到0.7384、0.8191、0.6941,比传统的LDA模型分别提高1.62%、2.51%、1.2%,比VSM模型分别提高2.83%、10.99%、1.89%,基于改进的LDA模型在聚类问题处理上要优于LDA模型和VSM。
In the short text clustering model,the traditional LDA(Latent Dirichletalloc Allocation)model does not consider the connection between the text and the topic.In order to consider the connection between the text and the topic,we propose an LDA model with discriminative learning ability.In the LDA-λmodel,we introduce the binomial distribution into the LDA basic model to increase the learning of the discriminative ability of terms,And then conduct theoretical analysis and comparative experiments.The comparative experiment results show that the accuracy(ACC),normalized mutual information(NMI)and paired F measurement(PWF)of the improved LDA model are relatively high,reaching 0.7384,0.8191,and 0.6941 respectively.Compared with the traditional LDA model,the improvement is 1.62%,2.51%and 1.2%;compared with VSM model,the improvement is 2.83%,10.99%and 1.89%respectively.Therefore,it can be obtained from experiments that the improved LDA model is superior to the LDA model and VSM in processing clustering problems.
作者
孙红
俞卫国
SUN Hong;YU Wei-guo(School of Optical-electrical and Computer Engineering,University of Shanghai for Science and Technology;Shanghai Key Laboratory of Modern Optical System,Shanghai 200093,China)
出处
《软件导刊》
2021年第9期1-6,共6页
Software Guide
基金
国家自然科学基金项目(61472256,61170277,61703277)
沪江基金项目(C14002)。