摘要
随着信息技术的快速发展,在煤炭产业中也挖掘出了大量的煤炭数据。煤炭产业管理者希望能够应用现有的煤炭数据进行分析预测,但是海量煤炭数据的处理分析是一地大难点。文章针对煤炭数据的分类问题,提出了基于MapReduce分布式计算框架的贝叶斯分类算法,该算法分布式地完成分类问题,能够更加快速、有效地处理大规模的数据。通过文中的实验结果也进一步说明文中提出的分布式贝叶斯分类算法有很高的效率,与传统算法相比有明显的加速比,并且,该算法也具有很好的可扩展性。
With the qmck deveiopment of technology, it produces a huge amount of coal data and each coal industry produces large scale coal data. The managers of coal industry hope that they could make good use of the huge amount of data to do the classification, but it is a hard problem to deal with huge scale data. In this paper, focusing on classification problem, we propose distributed Bayesian logistic regression algorithm based on MapReduce framework, and this algorithm could complete the classification problem in coal industry, and it can deal with big scale data faster and effectively. The experimental results further show that the distributed bayesian logistic regression algorithm has good efficiency and good speed-up comparing with traditional algorithm, and it has good scalability..
出处
《煤炭技术》
CAS
北大核心
2013年第9期184-186,共3页
Coal Technology