
基于决策树模型重用的分布变化流数据学习 被引量:14

Learning from distribution-changing data streams via decision tree model reuse
摘要 在很多真实应用中,数据以流的形式不断被收集得到.由于数据收集环境往往发生动态变化,流数据的分布也会随时间不断变化.传统的机器学习技术依赖于数据独立同分布假设,因而在这类分布变化的流数据学习问题上难以奏效.本文提出一种基于决策树模型重用的算法进行分布变化的流数据学习.该算法是一种在线集成学习方法:算法将维护一个模型库,并通过决策树模型重用机制更新模型库.其核心思想是希望从历史数据中挖掘与当前学习相关的知识,从而抵御分布变化造成的影响.通过在合成数据集和真实数据集上进行实验,我们验证了本文提出方法的有效性. In many real-world applications,data are collected in the form of streams.As a result of the evolving nature of dynamic environments,the distribution of data streams generally changes over time.Such distribution changes hinder the application of conventional machine learning approaches because the fundamental assumption of independent and identical distribution does not hold in these scenarios.This paper proposes an algorithm based on the decision tree model reuse mechanism for learning from distribution-changing data streams.The proposed algorithm is essentially an online ensemble method that maintains a model pool and updates it by performing decision tree model reuse.The main idea is to exploit the useful knowledge in historical data to help resist the negative effects of distribution changes.We validate the effectiveness of the proposed approach through experiments on synthetic and real-world datasets.
作者 赵鹏 周志华 Peng ZHAO;Zhi-Hua ZHOU(National Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210023,China)
出处 《中国科学:信息科学》 CSCD 北大核心 2021年第1期1-12,共12页 Scientia Sinica(Informationis)
基金 国家自然科学基金(批准号:61921006)资助项目。
关键词 机器学习 分布变化 流数据 模型重用 集成学习 动态环境 machine learning distribution change data stream model reuse ensemble methods dynamic environments
  • 相关文献



  • 1Li N, Tsang I W, Zhou Z H. Efficient optimization of performance mea- sures by classifier adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(6): 1370-1382.
  • 2Pan S J, Yang Q. A survey of transfer learning. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345-1359.
  • 3Sugiyama M, Kawanabe M. Machine Learning in Non-Stationary En- vironments: Introduction to Covariate Shift Adaptation. Cambridge, MA: MIT Press, 2012.
  • 4Da Q, Yu Y, Zhou Z H. Learning with augmented class by exploiting unlabeled data. In: Proceedings of the 28th AAAI Conference on Arti- ficial Intelligence. 2014, 1760-1766.
  • 5Mu X, Ting K M, Zhou Z H. Classification under streaming emerg- ing new classes: a solution using completely random trees. CORR abs/1605.09131, 2016.
  • 6Hou C, Zhou Z H. One-pass learning with incremental and decremental features. CORR abs/1605.09082, 2016.
  • 7Dietterich T G. Towards robust artificial intelligence. AAAI Presiden- tial Address at the 30th AAAI Conference on Artificial Intelligence. 2016.
  • 8Zhou Z H, Jiang Y, Chen S F. Extracting symbolic rules from trained neural network ensembles. AI Communications, 2003, 16(1): 3-15.
  • 9Zhou Z H, Jiang Y. NeC4.5: Neural ensemble based C4.5. IEEE Trans- actions on Knowledge and Data Engineering, 2004, 16(6): 770-773.
  • 10Zhou Z H. Ensemble Methods: Foundations and Algorithms. Boca Ra- ton, FL: CRC Press, 2012.












使用帮助 返回顶部