期刊文献+

基于半监督学习的数据流集成分类算法 被引量:18

Semi-Supervised Learning Based Ensemble Classifier for Stream Data
原文传递
导出
摘要 已有的数据流分类算法多采用有监督学习,需要使用大量已标记数据训练分类器,而获取已标记数据的成本很高,算法缺乏实用性.针对此问题,文中提出基于半监督学习的集成分类算法SEClass,能利用少量已标记数据和大量未标记数据,训练和更新集成分类器,并使用多数投票方式对测试数据进行分类.实验结果表明,使用同样数量的已标记训练数据,SEClass算法与最新的有监督集成分类算法相比,其准确率平均高5.33%.且运算时间随属性维度和类标签数量的增加呈线性增长,能够适用于高维、高速数据流分类问题. Stream data classification algorithms are mainly based on supervised learning strategy, and they need massive labeled data for training. These approaches are unpractical due to the high cost of acquiring labeled data in a real streaming environment. A semi-supervised learning based ensemble classifier (SEClass) is presented for stream data classification. SEClass utilizes both a small number of labeled data and a great number of unlabeled data to train an ensemble classifier, and unlabeled instances are classified using the majority voting strategy. The experimental results show that the accuracy of SEClass is 5.33% higher in average than that of the state-of-the-art supervised method using the same number of labeled data for training. And SEClass is suitable for high-dimensional high-speed massive stream data classification.
出处 《模式识别与人工智能》 EI CSCD 北大核心 2012年第2期292-299,共8页 Pattern Recognition and Artificial Intelligence
基金 国家自然科学基金资助项目(No.60673024)
关键词 属性权值 概念漂移 集成分类器 同质性 K均值聚类 半监督学习 数据流分类 Attribute Weighting, Concept Drift, Ensemble Classifier, Homogeneity, K-meansClustering, Semi-Supervised Learning, Stream Data Classification
  • 相关文献

参考文献16

  • 1Han Jiawei,Kamber M. Data Mining:Concepts and Techniques[M].Singapore,Singapore:Elsevier,2006.
  • 2Wang Haixun,Fan Wei,Yu P S. Mining Concept-Drifting Data Streams Using Ensemble Classifiers[A].Washington DC USA,2003.226-235.
  • 3Aggarwal C. Data Streams:Models and Algorithms[M].Berlin,Germany:Springer-Verlag,2007.
  • 4Gehrke J,Ganti V,Ramakrishnan R. Boat-Optimistic Decision Tree Construction[A].Philadelphia USA,1999.169-180.
  • 5Domingos P,Hulten G. Mining High-Speed Data Streams[A].Boston,USA,2000.71-80.
  • 6Hulten G,Spencer L,Domingos P. Mining Time-Changing Data Streams[A].San Francisco,CA,USA,2001.97-106.
  • 7Scholz M,Klinkenberg R. An Ensemble Classifier for Drifting Concepts[A].Portugal,Porto,2005.53-64.
  • 8Aggarwal C C,Hat J,Wang Jianyong. A Framework for OnDemand Classification of Evolving Data Streams[J].IEEE Transactions on Knowledge and Data Engineering,2006,(05):577-589.
  • 9Masud M M,Gao Jing,Khan L. A Practical Approach to Classify Evolving Data Streams:Training with Limited Amount of Labeled Data[A].Pisa,Italy,2008.929-934.
  • 10Bifet A,Holmes G,Pfahringer B. New Ensemble Methods for Evolving Data Streams[A].France:Paris,2009.139-148.

同被引文献159

引证文献18

二级引证文献64

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部