Novel ensemble learning based on multiple section distribution in distributed environment

Novel ensemble learning based on multiple section distribution in distributed environment

下载PDF

导出

摘要 Because most ensemble learning algorithms use the centralized model, and the training instances must be centralized on a single station, it is difficult to centralize the training data on a station. A distributed ensemble learning algorithm is proposed which has two kinds of weight genes of instances that denote the global distribution and the local distribution. Instead of the repeated sampling method in the standard ensemble learning, non-balance sampling from each station is used to train the base classifier set of each station. The concept of the effective nearby region for local integration classifier is proposed, and is used for the dynamic integration method of multiple classifiers in distributed environment. The experiments show that the ensemble learning algorithm in distributed environment proposed could reduce the time of training the base classifiers effectively, and ensure the classify performance is as same as the centralized learning method. Because most ensemble learning algorithms use the centralized model, and the training instances must be centralized on a single station, it is difficult to centralize the training data on a station. A distributed ensemble learning algorithm is proposed which has two kinds of weight genes of instances that denote the global distribution and the local distribution. Instead of the repeated sampling method in the standard ensemble learning, non-balance sampling from each station is used to train the base classifier set of each station. The concept of the effective nearby region for local integration classifier is proposed, and is used for the dynamic integration method of multiple classifiers in distributed environment. The experiments show that the ensemble learning algorithm in distributed environment proposed could reduce the time of training the base classifiers effectively, and ensure the classify performance is as same as the centralized learning method.

作者 Fang Min

机构地区 Inst. of Computer

出处《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2008年第2期377-380,共4页 系统工程与电子技术（英文版）

基金 the Natural Science Foundation of Shaan’xi Province (2005F51).

关键词 distributed environment ensemble learning multiple classifiers combination. distributed environment, ensemble learning, multiple classifiers combination.

分类号 O151 [理学—基础数学] TP27 [自动化与计算机技术—检测技术与自动化装置]

引文网络
相关文献

参考文献8

1Freund Y. Boosting a weak algorithm by majority. Information and Computation, 1995, 121(2): 256-285.
2Freund Y, Schapire R E. A decision theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 1997, 55(1): 119-139.
3Robert Schapire E. The boosting approach to machine learning: an overview. MSRI Workshop on Nonlinear Estimation and Classification, 2002.
4Gunnar Jakob Briem, Jon Atli Benediktsson, et al. Boosting, bagging, and consensus based classification of multisource remote sensing data. Multiple Classifier Systems. Second International Workshop. MCS Cambridge, UK, 2001, July 2-4: 279-288.
5Fan Wei, Salvatore J Stolfo, Zhang Junxin. The application of AdaBoost for distributed, scalable and on-line learning. Proc. of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, California, United States, 1999: 362-366.
6Lazarevic A, Obradovic Z. Boosting algorithms for parallel and distributed learning. Distributed and Parallel Databases: An International Journal, Special Issue on Parallel and Distributed Data Mining, 2002, 2: 203-229.
7Scholz Martin. On the tractability of rule discovery from distributed data. Fifth IEEE International Conference on Data Mining, 2005: 761-764.
8Scholz Martin. On the complexity of rule discovery from distributed data. The Fifth IEEE International Conference on Data Mining, Houston, Texas, USA, 2005.

1Liu Ting,Lu Zhimao,Li Sheng.WORD SENSE DISAMBIGUATION BASED ON IMPROVED BAYESIAN CLASSIFIERS[J].Journal of Electronics(China),2006,23(3):394-398. 被引量：1
2薄丽丽,付主木,梁坤峰.基于PSO的BP网络在苹果颜色分级中的应用[J].信息化纵横,2009(16):63-66.
3Andrei B.Utkin.Mathieu Progressive Waves[J].Communications in Theoretical Physics,2011,56(10):733-739.
4HE HaiTao,LUO XiaoNan,MA FeiTeng,CHE ChunHui,WANG JianMin.Network traffic classification based on ensemble learning and co-training[J].Science in China(Series F),2009,52(2):338-346. 被引量：5
5ZHU Kai-hua QI Fei-hu JIANG Ren-jie XU Li.Automatic character detection and segmentation in natural scene images[J].Journal of Zhejiang University-Science A(Applied Physics & Engineering),2007,8(1):63-71. 被引量：12
6李文斌,刘椿年,钟宁.基于两阶段集成学习的分类器集成[J].北京工业大学学报,2010,36(3):410-419. 被引量：4
7薄丽丽,付主木,马建伟.基于GA的BP网络在苹果缺陷识别中的应用[J].河南科技大学学报（自然科学版）,2009,30(6):42-44.
8日立跨太平洋千兆以太网通信试验成功[J].通信与信息技术,2004(3):9-10.
9ZHOU Lan,LU Jing,SHI Tao.Quantum State Transfer in Engineered Spin Chain under Influence of Spatially Distributed Environment[J].Communications in Theoretical Physics,2009,52(8):226-234.
10Lean YU,Shouyang WANG,Kin Keung LAI.FORECASTING CHINA'S FOREIGN TRADE VOLUME WITH A KERNEL-BASED HYBRID ECONOMETRIC-AI ENSEMBLE LEARNING APPROACH[J].Journal of Systems Science & Complexity,2008,21(1):1-19. 被引量：5

Journal of Systems Engineering and Electronics

2008年第2期

浏览历史

内容加载中请稍等...

Novel ensemble learning based on multiple section distribution in distributed environment

参考文献8

相关作者

相关机构

相关主题

浏览历史