基于动态先验的LDA模型消息传递算法

BELIEF PROPAGATION OF LATENT DIRICHLET ALLOCATION BASED ON DYNAMIC PRIORI PARAMETERS

下载PDF

导出

摘要变分贝叶斯、吉布斯采样和消息传递是求解潜在狄利克雷分配(LDA)模型的三种主要近似推理算法,消息传递算法在效率和准确率上都明显优于其他两种。为了获得高可解释性的潜在语义空间,提出在迭代过程中动态调整先验参数的消息传递算法,使用加入伽马先验的固定点迭代方法自动学参数,同时探索对称先验以及非对称先验对模型泛化能力及文本分类准确率的影响。实验结果表明提出的动态非对称先验算法改进了模型的泛化能力,提高了文本分类的准确率。 There are three main approximate inference methods to seek the solution of latent Dirichlet allocation（ LDA） model： the variational Bayes,the Gibbs sampling and the belief propagation. Belief propagation algorithm is obviously competitive in both efficiency and accuracy to other two. For finding the latent semantic space with high interpretability,this paper proposes the belief propagation algorithm which dynamically adjusts priori parameters during iterations. It automatically learns the parameters by the fixed point iteration method with Gamma priori added. Meanwhile,we explore the effect of symmetric priori and asymmetric priori on the generalisation ability of model and the accuracy of text classification. Experimental results show that the proposed dynamic asymmetric priori algorithm improves the generalisation ability of model as well as raises the accuracy of text classification.

作者吴晓娜严建峰刘晓升

机构地区苏州大学计算机科学与技术学院

出处《计算机应用与软件》 CSCD 2015年第8期220-223,275,共5页 Computer Applications and Software

基金国家自然科学基金项目(61003154 61373092 61033013 61272449 61202029) 江苏省高校自然科学研究项目(11KJB520018) 江苏省教育厅重大项目(12KJA520004) 苏州大学创新团队项目(SDT2012B02) 广东省重点实验室开放课题(SZU-GDPHPCL-2012-09)

关键词 LDA 消息传递算法对称先验非对称先验 LDA Belief propagation algorithm Symmetric priori Asymmetric priori

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献12

1杨潇,马军,杨同峰,杜言琦,邵海敏.主题模型LDA的多文档自动文摘[J].智能系统学报,2010,5(2):169-176. 被引量：23
2王朝飞,王凯.主题模型在数字图书馆Web服务中的应用[J].情报理论与实践,2010,33(2):118-120. 被引量：4
3Dacid M Blei,Andrew N G,Michael Jordan. F^itent DirirhlH allocation[J]. Journal of Machine I>eaming Re.search ,2003 ,3(1) :993 — 1022.
4Thomas L Griffiths, Mark Steyvers. Finding scientific topics [ J ]. Pro-ceedings of the National Academy of Sciences, 2004, 101 ( Suppl. 1 ):5228-5235.
5Zeng Jia,William K Cheung,Liu Jiming. learning topic models by be-lief Propagation [ J ]. IEKE Transactions on Pattern Analysis and Ma-chine Intelligence,2013,33(5) :1121 - 1134.
6Minka T P. Estimating a Dirichlet distribution [ KB/OI.]. ( 2005-8 )[2013-12 ] . http://research. micrx)soft. corn/en-ua/um/people/min-ka/ papers/ dirichlet/.
7Huang J. Maximum likelihood estimation of Diric hlet distributions[ J ].Journal of Statistical Computation and Simulation, 2004, 32 ( 5 ) : 215-221.
8Hanna A M,Wallach M,David M Mimno. Rethinking Ida : Why priorsmatter: Annual Conference on Neural Information Processing Systems[C] . Vancouver:Curran Associates,2009 : 1973 - 1981.
9Ding C H. A probabilistic model for latent semantic indexing[ J]. Jour-nal of the American Society for Information Science and Technology,2005,56(6);597-608.
10徐戈,王厚峰.自然语言处理中主题模型的发展[J].计算机学报,2011,34(8):1423-1436. 被引量：233

二级参考文献86

1秦兵,刘挺,李生.基于局部主题判定与抽取的多文档文摘技术[J].自动化学报,2004,30(6):905-910. 被引量：10
2熊朝松,甘岚.基于子主题概念的Web主题挖掘[J].计算机与现代化,2006(4):63-65. 被引量：1
3RADEV D R,HOVY E,MCKEOWN K.Introduction to the special issue on text summarization[J].Computational Linguistics,2002,28(4):399-408.
4LEE J H,SUN P,AHN C M,et al.Automatic generic document summarization based on non-negative matrix factorization[J].Information Processing and Management,2009,45(1):20-34.
5HIRAO T,ISOZAKI H,MAEDA E,et al.Extracting important sentences with support vector machines[C]//Proc of the 19th International Conference on Computational Linguistics.Taipei,China,2002:1-7.
6NENKOVA A,VANDERWENDE L.The impact of frequency on summarization:MSR-TR-2005-101[R].Redmond,USA:Microsoft Research,2005.
7LINC Y,HOVY E.The automated acquisition of topic signatures FOR text summarization[C]//Proc of the 18th International Conference on Computational Linguistics.Sarbrflcken,Germany,2000:271-278.
8ANTIQUEIRA L,Jr OLIVEIRA O N.A complex network approach to text summarization[J].Information Science,2009 (179):584-599.
9WAN X J,YANG J W.Multi-document summarization using cluster-based link analysis[C]//Proc of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Sheffield,UK,2008:299-306.
10HARABAGIU S,HICKL A,LACATUSU F.Satisfying information needs with multidocument summaries[J].Information Processing and Management,2007,43(6):1619-1642.

共引文献250

1刘娜,肖智博,路莹,唐晓君,肖鹏.自适应主题融合的多文档自动摘要算法[J].中南大学学报（自然科学版）,2013,44(S2):205-209.
2杨冀林.基于主题模型的图像分割技术研究[J].计算机仿真,2011,28(12):300-303. 被引量：1
3潘大胜,覃焕昌.区域活动模型彩色图像分割研究[J].计算机仿真,2012,29(2):277-280.
4汤雁冰.氧化铜铁矿石中伴生金银的回收[J].有色矿山,2000,29(3):28-31. 被引量：2
5阳小兰,钱程.基于主题提取和记忆模型的新闻推荐系统设计[J].计算机与数字工程,2012,40(6):47-50. 被引量：1
6周亦鹏,杜军平.基于时空情境模型的主题跟踪[J].华南理工大学学报（自然科学版）,2012,40(8):82-87. 被引量：1
7陈叶旺,王华珍,李海波,钟必能,陈锻生.基于百度百科与文本分类的网络文本语义主题抽取方法[J].小型微型计算机系统,2012,33(12):2605-2610. 被引量：9
8乐小虬,洪娜.面向社会文本流数据探测爆发主题方法浅析[J].现代图书情报技术,2012(10):21-27.
9王力,李培峰,朱巧明.一种基于LDA模型的主题句抽取方法[J].计算机工程与应用,2013,49(2):160-164. 被引量：10
10曾嘉,严建峰,龚声蓉.复杂文本网数据的主题建模进展[J].计算机学报,2012,35(12):2431-2445. 被引量：5

1屠雄刚,陈军,杨璐,严建峰.基于LDA模型的稀疏消息传递算法[J].电工技术学报,2015,30(S1):506-511. 被引量：1
2余万涛,胡光锐.一种基于动态基础设施的移动agent消息传递算法[J].计算机应用与软件,2007,24(8):38-40.
3王杰,严建峰,刘晓升,杨璐.HDP采样消息传递算法[J].计算机应用研究,2016,33(7):1994-1998. 被引量：1
4李宝平,靳聪.基于AD-CESUS联和测度的立体匹配算法[J].中国传媒大学学报（自然科学版）,2016,23(6):46-51. 被引量：1
5龚声蓉,叶芸,刘纯平,季怡.基于在线消息传递的主题追踪方法[J].计算机学报,2015,38(2):249-260. 被引量：2
6刘启元,张聪,沈一栋.信度网近似推理算法(上)[J].计算机科学,2001,28(1):70-73. 被引量：7
7王永贵,张旭,刘宪国.基于AT模型的微博用户兴趣挖掘研究[J].计算机工程与应用,2015,51(13):126-130. 被引量：5
8杜鹏,毕光国.LDPC译码中的消息传递算法和置信传播算法等效性的证明[J].应用科学学报,2005,23(2):218-220.
9刘晓莉,杨灵娥,宋春玲.提高多目标输出神经网络模型泛化能力和预测精度的方法[J].佛山科学技术学院学报（自然科学版）,2008,26(1):31-33. 被引量：4
10丁文剑.云计算中消息传递相关问题探讨[J].无线互联科技,2012,9(1):67-68. 被引量：2

计算机应用与软件

2015年第8期

浏览历史

内容加载中请稍等...

基于动态先验的LDA模型消息传递算法

参考文献12

二级参考文献86

共引文献250

相关作者

相关机构

相关主题

浏览历史