期刊文献+

基于MapReduce的并行PLSA算法及在文本挖掘中的应用 被引量:7

MapReduce Based Parallel Probabilistic Latent Semantic Analysis for Text Mining
下载PDF
导出
摘要 PLSA(Probabilistic Latent Semantic Analysis)是一种典型的主题模型。复杂的建模过程使其难以处理海量数据,针对串行PLSA难以处理海量数据的问题,该文提出一种基于MapReduce计算框架的并行PLSA算法,能够以简洁的形式和分布式的方案来解决大规模数据的并行处理问题,并把并行PLSA算法运用到文本聚类和语义分析的文本挖掘应用中。实验结果表明该算法在处理较大数据量时表现出了很好的性能。 PLSA((Probabilistie Latent Semantic Analysis) is a typical topic model. To enable a distributed computation of PLSA for the ever-increasing large datasets, a parallel PLSA algorithm based on MapReduce is proposed in this paper. Applied in text clustering and semantic analysis, the algorithm is demonstrated by the experiments for s its scalability in dealing with large datasets.
出处 《中文信息学报》 CSCD 北大核心 2015年第2期79-86,共8页 Journal of Chinese Information Processing
基金 国家自然科学基金(61175052 61203297 61035003) 国家863高技术研究发展计划(2014AA012205 2013AA01A606 2012AA011003)
关键词 概率主题模型 MAPREDUCE 并行 语义分析 probabilistic latent semantic analysis MapReduce text clustering semantic analysis
  • 相关文献

参考文献21

  • 1宋晓雷,王素格,李红霞,李德玉.基于概率潜在语义分析的词汇情感倾向判别[J].中文信息学报,2011,25(2):89-93. 被引量:15
  • 2Blei D M, Jordan M I. Modeling annotated data[C]//Proceedings of the 26th Annual International ACM SI-GIR Conference on Research and Development in Infor-mation Retrieval. Los Alamitos: IEEE Computer Soci-ety, 2003: 127-134.
  • 3Monay F, Gatica-Perez D. Modeling semantic aspectsfor cross-media image indexing [J]. IEEE Transac-tions on Pattern Analysis and Machine Intelligence,2007, 29(10): 1802-1817.
  • 4Li Z-X,Shi Z-P,Liu X,et al. Automatic image anno-tation with continuous PLSA[C]//Proceedings of the35th IEEE International Conference on Acoustics,Speech and Signal Processing. Los Alamitos: IEEEComputer Society, 2010: 806-809.
  • 5Mark Steyvers. Probabilistic Topic Models[C]//Pro-ceedings of Latent Semantic Analysis: A Road toMeaning. Laurence Erlbaum,2007 :420-440.
  • 6Scott CD, Susan TD,Thomas KL,et al. Indexing bylatent semantic analysis [J]. Journal of the AmericanSociety for Information Science, 1990,41(6) :391-407.
  • 7Hofmann T. Probabilistic Latent Semantic Analysis[C]//Proceedings of 15th Conference on Uncertaintyin Artificial Intelligence, San Francisco: Morgan Kauf-mann. 1999 : 289-296.
  • 8Hofmann T. Unsupervised learning by probabilistic la-tent semantic analysis [J]. Machine Learning,2001,42(1): 177-196.
  • 9张玉芳,朱俊,熊忠阳.改进的概率潜在语义分析下的文本聚类算法[J].计算机应用,2011,31(3):674-676. 被引量:14
  • 10Hong C, Chen W,Zheng W, et al. Parallelizationand characterization of probabilistic latent semantic a-nalysis [ C ^//Proceedings of Parallel Processing,2008.ICPP'08. 37th International Conference on.IEEE,2008: 628-635.

二级参考文献123

  • 1陈浩,何婷婷,姬东鸿.基于k-means聚类的无导词义消歧[J].中文信息学报,2005,19(4):10-16. 被引量:16
  • 2朱嫣岚,闵锦,周雅倩,黄萱菁,吴立德.基于HowNet的词汇语义倾向计算[J].中文信息学报,2006,20(1):14-20. 被引量:326
  • 3徐琳宏,林鸿飞,杨志豪.基于语义理解的文本倾向性识别机制[J].中文信息学报,2007,21(1):96-100. 被引量:122
  • 4金千里,赵军,徐波.弱指导的统计隐含语义分析及其在跨语言信息检索中的应用[C]//全国第七届计算语言学联合学术会议.北京:清华大学,2003-08-01:527-533.
  • 5WANG ZAN, TSIM Y C, YEUNG W S, et al. Probabilistic Latent Semantic Analysis (PLSA) in bibliometric analysis for technology forecasting [ J]. Journal of Technology Management and Innovation, 2007, 41(6): 11-24.
  • 6HOFMANN T. Unsupervised learning by probabilistic latent seman- tic analysis [ J]. Machine Learning, 2001, 42(1/2) : 177 - 196.
  • 7PETERSEN B, WINTER O, HANSEN L K. On the slow conver- gence of EM and VBEM in low-noise linear models [ J]. Neural Computation, 2005, 17(9): 1921-1926.
  • 8AZADI T El, ALMASGANJ F. Using backward elimination with a new model order reduction algorithm to select best double mixture model for document [ J]. Expert Systems with Applications, 2009, 36(7) : 10485 - 10493.
  • 9TIPPING M, BISHOP C M. Probabilistic principal component anal- ysis [J]. Journal of the Royal Statistical Society, Series B, 1999, 61(3): 611-622.
  • 10DING C H Q. A similarity-based probability model for latent seman- tic indexing [ C]// Proceedings on the 22nd Annual International ACM SIGIR Conference on Research and Development in Informa- tion Retrieval. Berkeley: ACM Press, 1999:194-198.

共引文献125

同被引文献75

引证文献7

二级引证文献55

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部