摘要
随着气象信息化程度不断提高,气象部门积累了海量的气象数据,如何从海量的数据中获取有用的知识,成为人们关注的重点。气象数据具有维度高、依赖性强等特点,这就对气象数据挖掘提出了更高的要求。经典数据挖掘算法在处理海量气象数据时在性能与准确率方面无法获得较好的结果。在分析了MapReduce计算模型与粗糙集、贝叶斯分类的基础上,给出了基于MapReduce的计算等价类的数据约简算法与朴素贝叶斯分类算法。最后在Hadoop平台上进行了相关实验。实验结果表明,该并行数据挖掘方案可以有效处理海量气象数据,并具有良好的扩展性。
With the continuous development of meteorological informatisation level,massive meteorological data has been piled up in meteorological departments,how to extract useful knowledge from massive data becomes the focus of attention.Meteorological data has the features of high dimensions and strong dependence,which puts forward higher requirements to meteorological data mining.Classic data mining algorithms cannot achieve better results in performance and accuracy when processing massive meteorological data.On the basis of analysing MapReduce calculation model,rough set theory and Bayesian classification,we propose a MapReduce-based data reduction algorithm and native Bayesian classification algorithm for computing equivalence class.Finally,on Hadoop platform we carry out the correlated experiment. It is demonstrated by the experimental results that this paralleled data mining scheme can efficiently process massive meteorological data and has good scalability.
出处
《计算机应用与软件》
CSCD
2015年第4期72-76,90,共6页
Computer Applications and Software
基金
国家自然科学基金项目(61363052)
内蒙古研究生科研创新项目(S20131012810)
内蒙古教育厅自然科学基金项目(NJZY12052)
内蒙古工业大学重点项目(ZD201118)