期刊文献+

基于海量高维煤炭数据的分布式贝叶斯算法的研究与应用

Research and Application of Large Scale and High Dimensional Coal Data Based Distributed Bayesian Algorithm
下载PDF
导出
摘要 随着信息技术的快速发展,在煤炭产业中也挖掘出了大量的煤炭数据。煤炭产业管理者希望能够应用现有的煤炭数据进行分析预测,但是海量煤炭数据的处理分析是一地大难点。文章针对煤炭数据的分类问题,提出了基于MapReduce分布式计算框架的贝叶斯分类算法,该算法分布式地完成分类问题,能够更加快速、有效地处理大规模的数据。通过文中的实验结果也进一步说明文中提出的分布式贝叶斯分类算法有很高的效率,与传统算法相比有明显的加速比,并且,该算法也具有很好的可扩展性。 With the qmck deveiopment of technology, it produces a huge amount of coal data and each coal industry produces large scale coal data. The managers of coal industry hope that they could make good use of the huge amount of data to do the classification, but it is a hard problem to deal with huge scale data. In this paper, focusing on classification problem, we propose distributed Bayesian logistic regression algorithm based on MapReduce framework, and this algorithm could complete the classification problem in coal industry, and it can deal with big scale data faster and effectively. The experimental results further show that the distributed bayesian logistic regression algorithm has good efficiency and good speed-up comparing with traditional algorithm, and it has good scalability..
作者 刘小强
出处 《煤炭技术》 CAS 北大核心 2013年第9期184-186,共3页 Coal Technology
关键词 MAPREDUCE 文本分类 网站 贝叶斯 分布式 MapReduce text classification website Bayesian distributed
  • 相关文献

参考文献5

  • 1Madigan, D., Genkin, A., Lewis, I). D., and Fradkin, D. Bayesian Multinomial Logistic Regression tor Author Identification. The25th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, AlP Conference Proceedings, Vol. 803, Melville, NY: AlP, pp. 509-516. 2005.
  • 2A Genkin, DD Lewis, D Madigan . Large-scale Bayesian logis- tic regression for text categorization.Technometrics, 2007.
  • 3Armbst, M., et al. Above the clouds: A Berkeley view of cloud computing. Tech. Rep. UCBECS-2009-28, EECS De- partment, U.C. BerkeIe,, Feb 2009.
  • 4Clifford C. Clogga, Donald B. Rubinb, Nathaniel Schenkerc, Bradley Schultzd & Lynn Weidmane. MultipLe Imputation ofIndustry and Occupation Codes in Census Public-use Samples Using Bayesian Logistic Regression. Journal of the American Statistical Association,op, 68-78. 1991.
  • 5Dean J, Ghemawat S. MapReduce: Simplied Data Processing on Large Clusters. Proceedings of the 6th Symp. Operating System Design and Implementation (OSDI04).UsenixAssoc, 2004. 137,150.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部