摘要
决策树算法是数据挖掘中重要的分类算法,但目前多数针对决策树的改进方法都基于传统的串行算法,不能满足大数据环境下对海量数据挖掘的需要.针对大数据集中串行挖掘算法效率低下的问题,采用MapReduce对决策树算法进行了并行化实现,同时引入修正参数来改进ID3算法倾向于多值属性选取的问题.实验结果表明,该算法具有较好的并行性和扩展性,能有效处理大数据集的分类问题.
Decision tree is an important classification algorithm in data mining, but most of the improvement methods for decision tree are based on the traditional serial algorithm, which can't meet the need of massive data mining under big data environment. For the inefficiency of serial mining algorithm in massive data, MapReduce is used to parallelize the decision tree algorithm. At the same time, the modified parameters are introduced to avoid the ID3 algorithm tending to multi-valued attribute selection problem. The experi-mental results show that the proposed algorithm has good parallelism and scalability, and can effectively deal with massive data classifi-cation problem.
出处
《河南工程学院学报(自然科学版)》
2017年第2期57-61,共5页
Journal of Henan University of Engineering:Natural Science Edition
基金
河南省高等学校重点科研项目(16A520004)