摘要
分类是空间数据分析中一个非常重要的问题,采用贝叶斯网络进行分类,能够充分利用现有知识,实现对目标更精确的分类。随着实际可应用在贝叶斯网络学习中的数据样本量越来越大,贝叶斯分类器在结果更加准确的同时,其结构学习、参数学习、分类推断等每一个步骤的处理时间也会变得漫长,亟须将并行计算引入到贝叶斯网络的学习与分类预测中。该研究研发了一种海量空间数据的并行贝叶斯分类器,通过对矢量数据序列化、按空间拓扑关系分块、扩展基于MPI的并行原语等一系列设计,解决了其并行计算中不同节点矢量数据传输、负载均衡、异步IO等方面的问题。实验结果表明,并行贝叶斯分类器在保证结果一致的前提下大幅缩短了贝叶斯分类器学习与分类预测所需要的时间。
Classification is a very important problem in spatial data analysis. When Bayesian network is introduced to classifica- tion, we can make full use of existing knowledge, in order to get more accurate classification. With sample data that can be used in learning Bayesian network increasing, the classification results will become more precise. However, corresponding processing time of each step in Bayesian classifier, including structure learning, parameter learning, classification inference, will be extreme- ly long. Thus parallel computing was urgently introduced into the Bayesian network learning and classification prediction. In this paper, a parallel Bayesian classifier with mass spatial data is put forward. Also vector data serialization, spatial partition based on topology relationship, expanding MPI parallel primitives and other methods have been used to solve the spatial data transmission between different nodes, load balancing, asynchronous I/O and other problems. The experimental result shows that the parallel Bayesian classifier substantially shortens the time of spatial classification under the premise of consistent.
出处
《地理与地理信息科学》
CSCD
北大核心
2013年第4期47-51,85,共6页
Geography and Geo-Information Science
基金
国家863计划项目(2011AA120305
2011AA120302
2011AA12A401-1)
国家自然科学基金项目(41171344/D010703)
海洋公益性项目(201105033-6)
关键词
并行计算
贝叶斯网络
分类
EM算法
爬山算法
朴素贝叶斯算法
parallel computing
Bayesian network
classification
EM algorithm
Hill Climbing algorithm
Naive Bayes algorithm