期刊文献+

一种基于图挖掘的LDA改进算法 被引量:1

An Improved LDA Algorithm Based on Graph Mining
下载PDF
导出
摘要 LDA作为文本主题识别领域中使用最广泛的模型之一,其基于词袋模型的假设简单化地赋予词汇相同的权重,使得主题分布易向高频词倾斜,影响了识别主题的语义连贯性。本文针对该问题提出一种基于图挖掘的LDA改进算法GoW-LDA,首先基于特征词对在文本中的共现先后关系构建语义图模型,然后利用网络统计特征中节点的加权度,将文本的语义结构特点和关联性以权重修正的形式融入LDA主题建模中。实验结果显示,GoW-LDA相较于传统LDA和基于TF-IDF的LDA,能够大幅降低主题模型的混淆度,提高主题识别的互信息指数,并且有效减少模型的训练时间,为文本主题识别提供了一种新的解决思路。 As one of the most widely used models in the field of text topic recognition,LDA simplifies the assignment of the same weight to words based on the assumption of bag-of-words model,which makes the topic distribution inclined to high-frequency words,as well as affects the semantic coherence of the recognized topics.This paper proposes an improved LDA algorithm based on graph mining,named GoW-LDA,which firstly builds a semantic graph model based on the co-occurrence of feature word pairs in the text,then uses the weighting degree of nodes in network statistical features to integrate the semantic structure characteristics and relevance of the text into the LDA topic modeling in the form of weight correction.Experimental results show that,compared with traditional LDA and TF-IDF-based LDA,GoW-LDA can greatly reduce the complexity of topic models,improve the PMI of topic recognition,and effectively reduce the training time,which provides for a new solution idea text topic recognition.
作者 李珊 陈妙苗 郑晨 LI Shan;CHEN Miao-miao;ZHENG Chen(Dept.of Economics and Management,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)
出处 《计算机与现代化》 2022年第7期61-66,共6页 Computer and Modernization
基金 中央高校基本科研业务费专项基金资助项目(NJ2019023)。
关键词 文本主题识别 图挖掘 潜在狄利克雷分布 text topic recognition graph mining LDA(Latent Dirichlet Allocation)
  • 相关文献

参考文献14

二级参考文献191

共引文献160

同被引文献15

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部