摘要
概念漂移给数据流挖掘工作带来了很大阻碍。经典的SEA算法通过动态裁剪集成分类器的方式有效地捕获到概念漂移。其裁剪集成分类器的策略是直接删除掉一个权值最低的基础分类器,这意味着算法抛弃了一个已经学习了的概念,当该概念再出现时还需再学习,导致算法效率的降低。现提出了一种能够提取旧概念的算法(ECRRC),并给出了存储和提取概念的具体方法。面对概念的重复出现,ECRRC不用再学习就能够完成数据流分类。实验结果表明,ECRRC能够提高数据流分类效率。
Concept drift is a big obstacle in the field of mining stream data. By dynamic modifying the ensemble classifier,SEA can effectively catch concept drift for mining stream data. The method of SEA modifying the ensemble classifier is direct dropping a base classifier of the lowest weight. That means the algorithm abandon a learned concept,but the algorithm will waste time to learn the abandoned concept,as a result this leads to a low-level effective algorithm. A new algorithm ECRRC(Ensemble Classifiers Retrieving Repeated Concept ) with the ability of retrieving the old concept is proposed to reuse the old classifier. Facing the concept repeating,ECRRC need not learn again for mining stream data. Besides the method of storing and retrieving the concept is presented. The experimental results show that the algorithm raises classifying data stream efficiency.
出处
《科学技术与工程》
2010年第18期4521-4524,4529,共5页
Science Technology and Engineering
关键词
数据流分类
集成分类器
概念漂移
classify stream data ensemble classifier concept drift