摘要
将文本之间存在的时序关联性元信息和文档的标签信息,引入到隐藏Dirichlet分配模型中,提出一种在线增量标签主题(on-line labeled incremental topic model,OLT)模型.首先,在线增量标签主题模型优化了文本标签元信息与主题之间的映射关系;其次,利用动态字典增加了模型与文本的拟合程度.该模型优化了先验分布超参数迁移计算的连续性,解决了隐藏Dirichlet分配(LDA)模型不能利用文本属性与主题之间的相关性进行主题发现及演变分析的问题.实验结果表明,所提出的在线增量标签主题模型能显著改善多标签判别精度,提高模型的泛化能力并提升模型的运行性能.
Based on the introduction of the features of time series and labels of the document into latent Dirichlet allocation(LDA)model,an on-line labeled incremental topic model was presented.Firstly,on-line labeled incremental topic model realizes the predicate of multi-labels on the basis of the optimized label and topic mapping relation and improves the clustering results.Secondly,the on-line labeled incremental topic model achieves the reasonable correlation of text streams with the help of dynamic dictionary and the optimization calculation of hyper-parameter.The experimental results suggest on-line labeled incremental topic model can improve the decision accuracy of multi-labels,optimizing the generalization ability and operating efficiency.
出处
《吉林大学学报(理学版)》
CAS
CSCD
北大核心
2015年第5期992-998,共7页
Journal of Jilin University:Science Edition
基金
国家自然科学基金(批准号:60373099
60973040
61303131)
福建省高校杰出青年科研人才培育计划项目(批准号:JA13196)