期刊文献+

共词网络LDA模型的中文短文本主题分析 被引量:42

Chinese Short Text Topic Analysis by Latent Dirichlet Allocation Model with Co-word Network Analysis
下载PDF
导出
摘要 由于短文本的特征稀疏性,传统的LDA或PLSA主题模型分析短文本的效果并不理想。结合社交网络社区发现技术,提出CA-LDA模型(Latent Dirichlet Allocation Model with Co-word network Analysis)。在传统LDA模型的基础上加入共词网络分析,考虑词汇在不同文档间的共现情况,构建词汇社交网络;利用词汇社交网络隐含空间降维的方法,以自同构等价规则,合并在网络中结构特征相同的词汇,在不损失信息的前提下,降低了词汇矩阵稀疏性;考虑词汇搭配关系(网络节点的邻接),以共词网络特征向量中心度调节主题模型中的词汇权重,通过递归累加,提高与重要词汇搭配的词汇的重要性;在传统LDA主题模型吉布斯采样(Gibbs Sampling)过程中,同时增加隐含位置聚类模型的社区发现算法,提高了具有相同搭配关系词汇划分在同一主题下的概率。实验证明该模型在短文本分析中有较好的效果。 Given the sparse feature of the short text, the results of the traditional LDA or PLSA topic model is not suitable for analyzing short texts. Based on the traditional LDA model, a Latent Dirichlet Allocation Model with Co-word Network Analysis (CA-LDA) model is proposed considering the words co-occurrence network. According to the automorphic equivalence principle, the latent space model is used to reduce the dimension with minimum in- formation loss. Eigenvector Centrality is used to revise the LDA model to raise the weights of important words by recursive accumulation. During the Gibbs Sampling, the latent position cluster model for social networks is used to raise the probability that the words with similar lexical collocation are divided into the same topic. Experimental resuits show the excellent performance of the model.
作者 蔡永明 长青 Cai Yongming;Chang Qing(Business School, University of Jinan, Jinan 250002;School of Economics and Management, Inner Mongolia University of Technology, Huhhot 010051)
出处 《情报学报》 CSSCI CSCD 北大核心 2018年第3期305-317,共13页 Journal of the China Society for Scientific and Technical Information
基金 山东省社会科学规划项目"基于复杂网络理论的山东省基础设施系统脆弱性研究"(14CGLJ03)
关键词 共词网络LDA主题模型(CA-LDA) 隐含空间降维 自同构等价规则 隐含位置聚类 Latent Dirichlet Allocation Model with Co-word Network Analysis (CA-LDA) dimension-reduction algorithm with latent space model automorphic equivalence principle latent position cluster model for social networks
  • 相关文献

参考文献6

二级参考文献61

共引文献84

同被引文献633

引证文献42

二级引证文献233

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部