期刊文献+

利用并行GPU对分层分布式狄利克雷分布算法加速 被引量:2

Accelerating hierarchical distributed latent Dirichlet allocation algorithm by parallel GPU
下载PDF
导出
摘要 分层分布式狄利克雷分布(HD-LDA)算法是一个对潜在狄利克雷分布(LDA)进行改进的基于概率增长模型的文本分类算法,与只能在单机上运行的LDA算法相比,可以运行在分布式框架下,进行分布式并行处理。Mahout在Hadoop框架下实现了HD-LDA算法,但是因为单节点算法的计算量大,仍然存在对大数据分类运行时间太长的问题。而大规模文本集合分散到多个节点上迭代推导,单个节点上文档集合的推导仍是顺序进行的,所以处理大规模文本集合时仍然需要很长时间才能完成全部文本的分类。为此,提出将Hadoop与图形处理器(GPU)相结合,将单节点文本集合的推导过程转移到GPU上运行,实现单节点多个文档并行推导,利用多台并行的GPU对HD-LDA算法进行加速。应用结果表明,使用该方法能使分布式框架下的HD-LDA算法对大规模文本集合处理达到7倍的加速比。 Hierarchical Distributed Latent Dirichlet Allocation (HD-LDA), a popular topic modeling technique for exploring collections, is an improved Latent Dirichlet Allocation (LDA) algorithm running in distributed environment. Mahout has realized HD-LDA algorithm in the framework of Hadoop. However the algorithm processed the whole documents of a single node in sequence, and the execution time of the HD-LDA program was very long when processing a large amount of documents. A new method was proposed to combine Hadoop with Graphic Processing Unit (GPU) to solve the above problem when transferring the computation from CPU to GPU. The application results show that combining the Hadoop with GPU which processes many documents in parallel can decrease the execution time of HD-LDA program greatly and achieve seven times speedup.
出处 《计算机应用》 CSCD 北大核心 2013年第12期3313-3316,3330,共5页 journal of Computer Applications
基金 国家科技支撑计划项目(2011BAH14B02) 核高基重大专项(2012ZX01039-004) 中国科学院知识创新工程重要方向项目(KGCX2-YW-174) 新闻出版重大科技工程项目(GAPP-ZDKJ-ZK/23)
关键词 分层分布式狄利克雷分布 潜在狄利克雷分布 文本分类 分布式框架 并行图形处理器 Hierarchical Distributed Latent Dirichlet Allocation (HD-LDA) Latent Dirichlet Allocation (LDA) textclassification distributed environment parallel Graphic Processing Unit (GPU)
  • 相关文献

参考文献12

  • 1BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation [ J]. Journal of Machine Learning Research, 2003, 3(4/5) : 993 - 1022.
  • 2NEWMAN D, ASUNCION A, SMYT1A P, st al. Distributed infer- ence for latent Dirichlet allocation [ C]//NIPS 2007: Proceedings of the 2007 Twenty-First Annual Conference on Neural Information Pro- cessing System. IS. 1.]: NIPS, 2007:1081-1085.
  • 3CHEN W-Y, CHU J-C, LUAN J, et al. Collaborative filtering for orkut communities: discovery of user latent behavior [C]// WWW '09: Proceedings of the 18th International Conference on World Wide Web. New York: ACM, 2009:68l -690.
  • 4WANG Y., BA1 H J, STANTON M, et al. PLDA: Parallel Latent Dirichlet Allocation for large-scale applications [ C]// AAIM '09: Proceedings of the 5th International Conference on Algorithmic As- pects in Information and Management. Berlin: Springer-Verlag, 2009:301-314.
  • 5MASADA T, HAMADA T, SHIBATA Y, et al. Accelerating col- lapsed variational Bayesian inference for latent Dirichlet allocation with Nvidia CUDA cnmpatible devices [ C]// IEA/AIE '09: Pro- ceedings of the 22nd International Conterence on Industrial, Engi- neering and Other Applications of Applied Intelligent Systems: Next- Generation Applied Intelligence, LNCS 5579. Berlin: Springer-Ver- lag, 2009:491-500.
  • 6YAN F, XU N, QI Y. Parallel inference for latent Dirichlet alloca- tion on graphics processing units [ C]// NIPS 2009: Proceedings of the 2009 22nd Annual Conference on Neural Information Processing System. [S.I. ]: NIPS, 2009:2134 -2142.
  • 7LU M, BAI G, LUO Q, et al. Accelerating topic model training on a single machine [ C]// APWeb'13: Proceedings of the 2013 Fif- teenth International Asia-Pacific Web Conference, LNCS 7808. Berlin: Springer-Verlag, 2013:184-195.
  • 8JIANG Y J, WEN H L, GAO Z C. A method of accelerating LDA program with GPU [ C] // ICNDC 2012: Proeeedings of the 2012 Third International Conference on Networking and Distributed Com- puting. Washington, DC: IEEE Computer Society, 2012:26-29.
  • 9姚全珠,宋志理,彭程.基于LDA模型的文本分类研究[J].计算机工程与应用,2011,47(13):150-153. 被引量:56
  • 10董元元,陈基漓,唐小侠.基于潜在狄利克雷分配模型和互信息的无监督特征选取法[J].计算机应用,2012,32(8):2250-2252. 被引量:3

二级参考文献21

  • 1苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:386
  • 2伍建军,康耀红.文本分类中特征降维方式的研究[J].海南大学学报(自然科学版),2007,25(1):62-66. 被引量:4
  • 3Deerwester S,Dumais S T A.lndexing by latent semantic analysis[J] Journal of the Society for Information Science,1990,41(6).
  • 4Blei D,Ng A,Jordan M.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3(4/5).
  • 5Griffiths T L,Steyvers M.Finding scientific topics[J].PNAS,2004,101(1).
  • 6Chang Chih-Chung,Lin Chih-Jen.LIBSVM:A library for support vector machine[EB/OL].(2001).http://www.csie.ntu.edu.tw/~cjlin/libsvm.
  • 7SALTON G,WONG A, YANG C S. A vector space model for automatic indexing[J].Communications of ACM,1975,18(11):613-620.
  • 8DASGUPTA A,DRINEAS P,HARB B,et al.Feature selection methods for text classification[C] // KDD '07:Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2007:230-239.
  • 9BAKUS J, KAMEL M S. Higher order feature selection for text classification[J].Knowledge and Information Systems, 2006, 9(4): 468-491.
  • 10LIU HUAWEN, SUN JIGUI, LIU LEI, et al. Feature selection with dynamic mutual information[J].Pattern Recognition,2009,42(7):1330-1339.

共引文献57

同被引文献7

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部