期刊文献+

基于线性回归模型的单词加权LDA主题识别方法研究 被引量:1

Research on Word-weighted LDA Topic Recognition Method Based on Linear Regression Model
下载PDF
导出
摘要 针对社会化标签系统下Web资源存在大量潜在知识以及资源之间存在着独立性的问题,提出一种基于线性回归模型的单词加权潜在狄利克雷分布(LDA)的主题识别方法。通过线性回归模型建立任意文本资源之间的拟合函数,使用拟合函数获取每个资源的权重值,解决资源之间存在独立同分布的问题,并对拟合函数的数据点进行加权操作,进而实现语料库中每个单词的加权,最终获得字典单词的权重系数。在单词加权基础上建立单词加权LDA模型,通过吉布斯采样对Web资源的潜在主题进行深入挖掘。实验结果表明,相比传统主题模型,新的单词加权LDA算法在Web资源上具有更好的主题识别效果。 Aiming at the existence of a large amount of potential knowledge and the independence of resources in Web resources under the social tagging system,a word-weighted LDA(latent Dirichlet allocation)topic recognition method based on linear regression model is proposed.We establish a fitting function between arbitrary text resources through a linear regression model,use the fitting function to obtain the weight value of each resource to solve the independent and identical distribution characteristics in resources.And the weighting operation on the data points of the fitting function is used to achieve the weight of each word in the corpus,and finally obtain the weight coefficient of the dictionary word.A word-weighted LDA model is established on the basis of word weighting,and the potential topics of Web resources are deeply explored through Gibbs sampling.Experimental results show that the new recognition method has better topic recognition effects on Web resources than traditional topic models.
作者 邰悦 葛斌 TAI Yue;GE Bin(Anhui University of Science and Technology, Huainan 232001, China)
出处 《金陵科技学院学报》 2021年第2期39-45,共7页 Journal of Jinling Institute of Technology
基金 国家自然科学基金(51874003,61703005) 安徽省自然科学基金(1808085MG221)。
关键词 线性回归模型 单词加权 LDA 吉布斯采样 linear regression model word weighting LDA Gibbs sampling
  • 相关文献

参考文献7

二级参考文献57

  • 1钟连德,孙小端,陈永胜,贺玉龙,刘小明.高速公路事故预测模型[J].北京工业大学学报,2009,35(7):966-971. 被引量:8
  • 2吴文丽,刘玉树,赵基海.一种新的混合聚类算法[J].系统仿真学报,2007,19(1):16-18. 被引量:18
  • 3王惠文,孟洁.多元线性回归的预测建模方法[J].北京航空航天大学学报,2007,33(4):500-504. 被引量:240
  • 4Sudipto Guha, Rajeev Rastogi, Kyusemk Shim. ROCK: A robust clustering algorithm for categorical attributes [C]// Proc. 1999 Int. Conf. Data Engineering, Sydney, Australia, Mar, 1999: 512-521.
  • 5Alexandros Nanopoulos, Yannis Theodoridis, Yannis Manolopoulos. C2P: Clustering based on closest pairs [C]//Proc. 27th Int Conf. Very Large Database, Rome, Italy, September, 2001:331-340.
  • 6Ester M, Kriegel H P, Sander J Xu X. A density-based algorithm for discovering clusters in large spatial databases [C]//Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining (KDD'96), Portland, Oregon, USA, Aug, 1996:226-231.
  • 7Zhang T, Ramakrishnan R, Livny M. BIRTH: An efficient data clustering method for very large database [C]// Proc. The ACM- SIGMOD Int. Conf. Management of Data, Montreal, Quebec, Canada,June, 1996. USA: ACM, 1996: 103-114.
  • 8Sudipto Guha, Rajeev Rastogi, Kyuseok Shim. CURE: A clustering algorithm for large database [C]//Proc. The ACM-SIGMOD Int. Conf. Management of Data, Seattle, Washington, USA, June, 1998. USA: ACM, 1998: 73-84.
  • 9Karypis G, Han E-H, Kumar V. CHAMELEON: A hierarchical clustering algorithm using dynamic modeling [J]. IEEE Computer (S0018-9162), 1999, 32(8): 68-75.
  • 10Sheikholeslami G, Chatterjee S, Zhang A. Wave Cluster: A multi- resolution clustering approach for very large spatial databases [C]// Proc. 1998 Int. Conf. Very Large Databases, New York, USA, August, 1998: 428-439.

共引文献129

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部