期刊文献+

基于特征漂移的数据流集成分类方法 被引量:5

Ensemble classification based on feature drifting in data streams
下载PDF
导出
摘要 为构建更加有效的隐含概念漂移数据流分类器,依据不同数据特征对分类关键程度不同的理论,提出基于特征漂移的数据流集成分类方法(ECFD)。首先,给出了特征漂移的概念及其与概念漂移的关系;然后,利用互信息理论提出一种适合数据流的无监督特征选择技术(UFF),从而析取关键特征子集以检测特征漂移;最后,选用具有概念漂移处理能力的基础分类算法,在关键特征子集上建立异构集成分类器,该方法展示了一种隐含概念漂移高维数据流分类的新思路。大量实验结果显示,尤其在高维数据流中,该方法在精度、运行速度及可扩展性方面都有较好的表现。 In order to construct an effective classifier for data streams with concept drifting, according to the theory that different data feature has different critical degree for classification,a method of Ensem- ble Classifier for Feature Drifting in data streams (ECFD) is proposed. Firstly, the definite of feature drifting and the relationship between feature drifting and concept drifting is given. Secondly, mutual in- formation theory is used to propose an Unsupervised Feature Filter (UFF) technique,so that critical fea- ture subsets are extracted to detect feature drifting. Finally, the basic classified algorithms with the ca- pability of handling concept drifting is chosen to construct heterogeneous ensemble classifier on the basis of critical feature subsets. This method exhibits a new idea of way to high-dimensional data streams with hidden concept drifting. Experimental results show that the method has strong appearance in accuracy, speed and scalability, especially for high-dimensional data streams.
出处 《计算机工程与科学》 CSCD 北大核心 2014年第5期977-985,共9页 Computer Engineering & Science
关键词 特征选择 特征漂移 概念漂移 数据流 互信息 集成分类器 feature selection feature drifting concept drifting data stream mutual information en-semble classifier
  • 相关文献

参考文献19

  • 1Babcock B, Babu S, Datar M, et al. Models and issues in da ta stream systems [C]//Proc of ACM PODS, 2002:16- 24.
  • 2Tsymbal A. The problem of concept drift : Definitions and re lated work[R]. TCD-CS-2004-15.
  • 3Ireland:Trinity College Dublin, Department of Computer Science, 2004. Huhen G, Spencer L, Domingos P. Mining time-changing data streams [C]//Proc of ACM SIGKDD, 2001:97-106.
  • 4Wang H, Fan W, YU P S, et al. Mining concept-drifting da ta streams using ensemble classifiers [C]//Proe of the 9th ACM SIGKDD International Conference on Knowledge Dis eovery and Data Mining, 2003:226- 235.
  • 5Masud M M, Gao J, Han J, et al. Classification and novel class detection in concept drifting data streams under time constraints[J]. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(6):859-874.
  • 6Zhang P, Zhu X, Tan Jian long, et al. Classifier and cluster ensembles for mining concept drifting data streams [C]// Proc of IEEE International Conference on Data Ming, 2010: 1175-1180.
  • 7Sattar H, Ying Y, Zahra M, et al. Adapted one vs all deci- sion tree for data stream classification [J]. IEEE Transac tions on Knowledge and Data Engineering, 2009, 21 (5) :624- 637.
  • 8Inza I, Larranaga P, Blanco R, et al. Filter versus wrapper gene selection approaches in DNA microarray domains[J]. Artificial Intelligence in Medicine, 2004, 31(2):91-103.
  • 9Lei Y, Huan L. Feature selection for high-dimensional data: A fast correlation based filter solution[C]//Proe of the 20th ICML'03, 2003:856- 863.
  • 10Hsu W H. Genetic wrappers for feature selection in decision tree induction and variable ordering in Bayesian network structure learning[J]. Information Sciences, 2004, 163 (1 3) : 103-122.

同被引文献19

引证文献5

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部