摘要
为了提高传统朴素贝叶斯分类器对气象数据挖掘的精度,拥有更高的处理海量数据的效率,提出了一种Hadoop平台下基于离散贝叶斯网络的数据挖掘改进算法。算法不要求属性之间相互独立,且充分结合Hadoop平台适应处理大数据的优点,利用海量数据选取预测因子来训练贝叶斯网络分类器模型,以达到预测温度的目的。实验结果表明,算法不但预测精度明显高于目前短期气候预测中采用的朴素贝叶斯算法,而且极大地提高了运算效率。
In order to improve the precision of the meteorological data mining using raditional naive bayesian classitier, and own a higher efficiency of handling the huge amounts of data, this paper proposes an improved algorithm of discrete Bayesian network to predict the temperature. This algorithm can eliminate the weakness of naive bayesian method on the premise that attributes are independent of each other, and combine the characteristics of the Hadoop platform processing large data. Using massive meteorological data, it selects predictors and trains the Bayesian net- work classification model on Hadoop platform. The experiments show that the improved algorithm is not only the accuracy is significantly higher than the short-term climate prediction using Naive Bayesian analysis, regression analysis and cluster analysis method, but also improves the efficiency of the algorithm greatly.
出处
《电子器件》
CAS
北大核心
2016年第4期841-846,共6页
Chinese Journal of Electron Devices