期刊文献+

Novel ensemble learning based on multiple section distribution in distributed environment

Novel ensemble learning based on multiple section distribution in distributed environment
下载PDF
导出
摘要 Because most ensemble learning algorithms use the centralized model, and the training instances must be centralized on a single station, it is difficult to centralize the training data on a station. A distributed ensemble learning algorithm is proposed which has two kinds of weight genes of instances that denote the global distribution and the local distribution. Instead of the repeated sampling method in the standard ensemble learning, non-balance sampling from each station is used to train the base classifier set of each station. The concept of the effective nearby region for local integration classifier is proposed, and is used for the dynamic integration method of multiple classifiers in distributed environment. The experiments show that the ensemble learning algorithm in distributed environment proposed could reduce the time of training the base classifiers effectively, and ensure the classify performance is as same as the centralized learning method. Because most ensemble learning algorithms use the centralized model, and the training instances must be centralized on a single station, it is difficult to centralize the training data on a station. A distributed ensemble learning algorithm is proposed which has two kinds of weight genes of instances that denote the global distribution and the local distribution. Instead of the repeated sampling method in the standard ensemble learning, non-balance sampling from each station is used to train the base classifier set of each station. The concept of the effective nearby region for local integration classifier is proposed, and is used for the dynamic integration method of multiple classifiers in distributed environment. The experiments show that the ensemble learning algorithm in distributed environment proposed could reduce the time of training the base classifiers effectively, and ensure the classify performance is as same as the centralized learning method.
作者 Fang Min
机构地区 Inst. of Computer
出处 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2008年第2期377-380,共4页 系统工程与电子技术(英文版)
基金 the Natural Science Foundation of Shaan’xi Province (2005F51).
关键词 distributed environment ensemble learning multiple classifiers combination. distributed environment, ensemble learning, multiple classifiers combination.
  • 相关文献

参考文献8

  • 1Freund Y. Boosting a weak algorithm by majority. Information and Computation, 1995, 121(2): 256-285.
  • 2Freund Y, Schapire R E. A decision theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 1997, 55(1): 119-139.
  • 3Robert Schapire E. The boosting approach to machine learning: an overview. MSRI Workshop on Nonlinear Estimation and Classification, 2002.
  • 4Gunnar Jakob Briem, Jon Atli Benediktsson, et al. Boosting, bagging, and consensus based classification of multisource remote sensing data. Multiple Classifier Systems. Second International Workshop. MCS Cambridge, UK, 2001, July 2-4: 279-288.
  • 5Fan Wei, Salvatore J Stolfo, Zhang Junxin. The application of AdaBoost for distributed, scalable and on-line learning. Proc. of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, California, United States, 1999: 362-366.
  • 6Lazarevic A, Obradovic Z. Boosting algorithms for parallel and distributed learning. Distributed and Parallel Databases: An International Journal, Special Issue on Parallel and Distributed Data Mining, 2002, 2: 203-229.
  • 7Scholz Martin. On the tractability of rule discovery from distributed data. Fifth IEEE International Conference on Data Mining, 2005: 761-764.
  • 8Scholz Martin. On the complexity of rule discovery from distributed data. The Fifth IEEE International Conference on Data Mining, Houston, Texas, USA, 2005.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部