摘要
针对对等模式下并行决策树分类算法的通信开销太大,提出了一种基于主从模式的FPM_DT并行决策树挖掘算法,此算法综合使用了横向与纵向的数据划分模型,并采用根据分支数据分布情况进行结点分组的策略.实验结果表明,它与对等模式下并行SPRINT分类算法相比,降低了通信开销,具有更好的可扩展性与加速比性能.
One of greatest problems with parallel decision tree classification algorithm based on peer-peer mode is that it has great communication costs .To solve this problem, a fast scalable parallel decision tree classification algorithm based on master/slave mode named FPM-DT is proposed in this paper, It is the first algorithm to introduce several kinds of techniques such as partitioning the training datasets horizontally and vertically, partitioning the training datasets in each branch into groups according to the status ofdatasets distributed. Experimental results show that this algorithm has lower communication and I/O costs, better scaleup and speedup than parallel SPRINT algorithm based on peer-peer mode.
出处
《西南民族大学学报(自然科学版)》
CAS
2007年第4期743-745,共3页
Journal of Southwest Minzu University(Natural Science Edition)
关键词
主从模式
决策树
并行
分类
master/slave mode
decision tree
parallel
classification