期刊文献+

结合信息论改进的并行深度森林算法 被引量:1

Improved Parallel Deep Forest Algorithm Combining with Information Theory
下载PDF
导出
摘要 针对并行深度森林算法在处理大数据问题时存在的冗余与不相关特征过多,多粒度扫描不平衡以及并行化效率低等问题,提出了大数据环境下基于信息论改进的并行深度森林算法——IPDFIT(improved parallel deep forest based on information theory)。该算法基于信息论设计了一种混合降维策略DRIT(dimension reduction based on information theory),以获得降维后的数据集,有效减少了冗余及不相关特征的数量;提出了一种改进的多粒度扫描策略IMGSS(improved multi-grained scanning strategy)对样本进行扫描,保证每个特征在扫描后,同频率出现在数据子集中,避免了因多粒度扫描不平衡对深度森林模型的影响;结合MapReduce框架,对深度森林每层级联结构中的随机森林模型进行并行化训练,同时提出了一种样本加权策略TSWS(the sample weighting strategy),根据级联中随机森林模型对样本进行评估,选取评估结果较差的样本进入下一层训练,逐步减少了每层级中训练样本的数量,从而提高了算法的并行效率。实验结果表明,该算法在大数据环境下,尤其是针对特征数较多的数据集有着更好的分类效果。 Aiming at the problems of excessive redundancy and irrelevant features, multi-grained scanning imbalance and low parallelization efficiency in big data parallel deep forest algorithm, this paper proposes an improved parallel deep forest based on information theory, named IPDFIT. Firstly, a dimension reduction based on information theory is presented to reduce the dimensionality of the original data set. Secondly, an improved multi-grained scanning strategy IMGSS to ensure that each feature appears in the data subset with the same frequency. Finally, in order to improve the parallel efficiency of the deep forest algorithm, the sample weighting strategy is proposed to evaluate the sample according to the forest in the cascade. Based on the evaluate results, the algorithm selects samples with poor evaluation to enter the next layer of training. The experimental results show that the IPDFIT algorithm has a better classification results in a big data environment, especially for data sets with more features.
作者 毛伊敏 耿俊豪 陈亮 MAO Yimin;GENG Junhao;CHEN Liang(School of Information Engineering,Jiangxi University of Science&Technology,Ganzhou,Jiangxi 341000,China;School of Applied Science,Jiangxi University of Science&Technology,Ganzhou,Jiangxi 341000,China)
出处 《计算机工程与应用》 CSCD 北大核心 2022年第7期106-115,共10页 Computer Engineering and Applications
基金 国家重点研发计划项目(2018YFC1504705) 国家自然科学基金(41562019) 江西省教育厅科技项目(GJJ209406)。
关键词 MAPREDUCE框架 深度森林 DRIT策略 IMGSS策略 TSWS策略 MapReduce framework deep forest DRIT strategy IMGSS strategy TSWS strategy
  • 相关文献

参考文献3

二级参考文献10

共引文献114

同被引文献4

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部