一种面向涌现的比较性话题模型被引量：2

A comparative topic model for words burstiness

下载PDF

导出

摘要提出一种CDCMLDA生成模型来实现跨文本集的话题分析,采用狄利克雷组合多项式模型(Dirichlet Compound Multinomial,DCM)对文本集中词的涌现现象进行建模,把DCM模型和LDA结合起来分析文本集之间话题的差异,采用蒙特卡罗期望最大化方法进行参数推导。在多个实际数据集中通过定性和定量的方法对模型进行评价,实验表明,模型不仅能够发现不同文本集间的异同,而且在模型困惑度指标上相对当前两种主要跨文本集的话题模型具有明显的优势。 State-of-the-art cross collections topic models suffer from the serious flaw that it cannot capture the tendency of words to appear in bursts. Based on LDA （Latent Dirichlet Allocation）, a topic model CDCMLDA（ Cross-collection Diriehlet compound multinomial Latent Dirichlet Allocation）, which models the burstiness phenomena of words using Dirichlet compound muhinomial （DCM） distribution, was proposed. A Monte Carlo Expectation Maximization algorithm for model inference was presented. A variety of qualitative and quantitative evaluations of CDCMLDA were performed, which shows that CDCMLDA not only discovers the common and unique aspects on topics, but also improves the model perplexity compared with the two cross-collection topic models.

作者谭文堂王桢文殷风景葛斌肖卫东

机构地区国防科技大学信息系统工程重点实验室

出处《国防科技大学学报》 EI CAS CSCD 北大核心 2013年第4期146-155,共10页 Journal of National University of Defense Technology

基金国家自然科学基金资助项目(60903225) 湖南省自然科学基金项目(11JJ5044) 国防科技大学优秀研究生创新基金项目(S100502)

关键词比较性文本挖掘涌现话题模型 CDCMLDA模型 comparative text mining burstiness topic model CDCMLDA model

分类号 O212.6 [理学—概率论与数理统计]

引文网络
相关文献

参考文献24

1Zhai C, Atulya V, Bei Y. A cross-collection mixture mode for comparative text mining[ C ]//Proceedings of The International Conference on Knowledge Discovery and Data Mining. Seattle, Washington, USA : ACM, 2004 : 743 - 748.
2Yin Z, Cao L,Jiawei Hart, et 81. Geographical topic discovery and comparison [ C ]//Proceedings of The International Conference on World Wide Web. Hyderabad, India, 2011:247 -256.
3Paul M, Girju R. Cross-cultural analysis of blogs and forums with mixed-collection topic models [ C ]//Proceedings of The 2009 Conference on Empirical Methods in Natural Language Processing. Singapore, 2009 : 1408 - 1417.
4Paul M, Girju R. Comparative scientific research analysis with a language-independent cross-collection model[ C ]//Proceedings of SEPLN , Valencia, Spain,2010:153-160.
5Madsen R E, Kauchak D, Elkan C. Modeling word burstiness using the dirichlet distribution[C]//Proceedings of the International Conference on Machine Learning, New York: ACM, 2005 : 545 - 552.
6Deerwester S, Dumais S, Furnas G, et al. Indexing by latent semantic analysis [ J ]. Journal of the American Society for Information Science, 1990:41 : 17.
7Hofmann T. Probabilistic latent semantic indexing [ C ]// Proceedings of SIGIR, 1999:50 - 57.
8Blei D M, Ng A Y , Jordan M I. Latent dirichlet allocation [ J ]. Journal of Machine Learning Research, 2003,3 : 993 - 1022.
9Li W, McCallum A. Pachinko allocation: DAG-Struetured mixture models of topic correlations [ C ]//Proceedings of the International Conference on Machine Learning, Pittsburgh, PA ,2006 : 577 - 584.
10Blei D M, Lafferty J D. Correlated topic models [ C ]// Proceedings of the Advances in Neural Information Processing Systems, 2006.

同被引文献57

1Barabasi A L, Albert R. Emergence of Scaling in Random Networks [J]. Science (S0036-8075), 1999, 286: 509-512.
2Lei Chai, Jiawei Chen, Zhangang Han, et al. Emergence of Specialization from Global Optimizing Evolution in a Multi-agent System [C]// ICCS 2007. Germany: Springer-Verlag Berlin Heidelberg, 2007: 98-105.
3Fatihcan M Atay. Synchronization and Emergence in Complex Systems [J]. Pramana-J. Phys. Science (S0304-4289), 2011, 77(5): 855-863.
4Liu Qiang, Fang Jinqing, LI Yong. Analysis of Layer Cross-degree on Super-network Models [R]// Annual Report of China Institute of Atomic Energy. Beijing, China: China Institute of Atomic Energy, 2013.
5Radoshw Michalski, Sebastian Palus, Piotr Brodka, et al. Modelling Social Network Evolution [C]//Soclnfo 2011. Germany: Springer-Verlag Berlin Heidelberg, 2011: 283-286.
6Dirk Aeyels, Filip De Smet. Emergence and Evolution of Multiple Clusters of Attracting Agents [J]. Physica D (S0167-2789), 2010, 239 (12): 1026-1037.
7Zhenwu Tao, Renbin Xiao, Lei Wang. Structure Emergence in the Evolution of Social Networks and its Case Study [J]. Procedia Computer Science (S1877-0509), 2013, 17(01): 981-988.
8Rushed Kanawati. Empirical Evaluation of Applying Ensemble Methods to Ego-centered Community Identification in Complex Networks [J]. Neurocomputing (S0925-2312), 2015, 150(2): 417-427.
9Mark W Jackwood, David Hall, Andreas Handel. Molecular Evolution and Emergence of Avian Gammacoronaviruses [J]. Infection Genetics and Evolution (S1567-1348), 2012, 12(6): 1305-1311.
10胡晓峰.战争复杂网络研究概述[J].复杂系统与复杂性科学,2010,7(2):24-28. 被引量：30

引证文献2

1杨迎辉,李建华,南明莉,陈强,温己方.分布式作战体系能力动态演化涌现建模[J].系统仿真学报,2016,28(7):1497-1505. 被引量：6
2何伟林,谢红玲,奉国和.潜在狄利克雷分布模型研究综述[J].信息资源管理学报,2018,8(1):55-64. 被引量：25

二级引证文献31

1陈济榕.300MW、600MW引进型切向燃烧锅炉温度偏差研究综述[J].锅炉技术,2000,31(3):1-5. 被引量：6
2马欣.主题模型的发展及应用研究[J].电脑知识与技术,2018,14(5X):16-18.
3张涛,蔡庆平,马海群.一种基于政策文本计算的政策内容分析方法实证研究——以互联网租赁自行车为例[J].信息资源管理学报,2019,9(1):66-76. 被引量：26
4刚建勋,袁红斌,于鸿源.航母编队基于作战任务的能力需求分析研究[J].指挥与控制学报,2019,5(2):121-127. 被引量：15
5王扶东,王苑潼.基于LDA模型的国内“一带一路”文献主题研究[J].情报探索,2019,0(11):129-134. 被引量：9
6韩肖赟,侯再恩,孙绵.主题模型在短文本上的应用研究[J].计算机工程与科学,2020,42(1):144-152. 被引量：1
7张军亮.多维度疾病语义相似度研究[J].图书情报工作,2020,64(12):127-135. 被引量：3
8余本功,范招娣.面向自然语言处理的条件随机场模型研究综述[J].信息资源管理学报,2020,10(5):96-111. 被引量：18
9万家山.基于深度学习的混合主题模型应用[J].湖南科技大学学报（自然科学版）,2020,35(3):102-109.
10陈博,马秀峰.国内LDA模型研究现状可视化分析[J].情报探索,2020(11):128-134. 被引量：4

1石素英.Fuzzy(强、弱)相似关系[J].阜新矿业学院学报,1997,16(1):123-126.
2柳朝阳.3-DJulia集和Mandelbrot集的生成模型[J].河南科学,1995,13(2):99-103. 被引量：2
3符运良.透镜光轴上的光振幅分布[J].海南师范学院学报（自然科学版）,2000,13(1):38-40.
4尹楠.基于高斯混合模型的期望最大化聚类算法[J].统计与决策,2017,33(4):87-89. 被引量：9
5赵华,赵铁军,于浩,张姝.面向动态演化的话题检测研究[J].高技术通讯,2006,16(12):1230-1235. 被引量：17
6YANG Zhen,WANG Laitao,FAN Kefeng,LAI Yingxu.Exemplar-Based Clustering Analysis Optimized by Genetic Algorithm[J].Chinese Journal of Electronics,2013,22(4):735-740. 被引量：1
7Yong Gao CHEN,Chun Gang JI Department of Mathematics. Nanjing Normal University. Nanjing 210097, P. R. China Department of Mathematics. Nanjing Normal University. Nanjing 210097, P. R. China Institute of Mathematics. Academy of Mathematics and System Sciences. Chinese Academy of Sciences. Beijing 100080. P. R. China.On a Function Related to Multinomial Coefficients (Ⅰ)[J].Acta Mathematica Sinica,English Series,2002,18(4):647-660. 被引量：1
8孙永安,何俊翔.范德华状态方程的参数推导[J].内蒙古民族大学学报（自然科学版）,1995,10(2):173-177. 被引量：1
9丁澍,缪柏其.当今本科生学业状况的统计分析[J].中国科学技术大学学报,2010,40(6):557-564. 被引量：14
10邓自飞.中国股票市场个股动量的分解与实证[J].吉首大学学报（自然科学版）,2005,26(1):101-103.

国防科技大学学报

2013年第4期

浏览历史

内容加载中请稍等...

一种面向涌现的比较性话题模型被引量：2

参考文献24

同被引文献57

引证文献2

二级引证文献31

相关作者

相关机构

相关主题

浏览历史

一种面向涌现的比较性话题模型 被引量：2

参考文献24

同被引文献57

引证文献2

二级引证文献31

相关作者

相关机构

相关主题

浏览历史

一种面向涌现的比较性话题模型被引量：2