期刊文献+

低代价的数据流分类算法 被引量:1

Low-Cost Algorithm for Stream Data Classification
下载PDF
导出
摘要 现有数据流分类算法大多使用有监督学习,而标记高速数据流上的样本需要很大的代价,因此缺乏实用性.针对以上问题,提出了一种低代价的数据流分类算法2SDC.新算法利用少量已标记类别的样本和大量未标记样本来训练和更新分类模型,并且动态监测数据流上可能发生的概念漂移.真实数据流上的实验表明,2SDC算法不仅具有和当前有监督学习分类算法相当的分类精度,并且能够自适应数据流上的概念漂移. Existing classification algorithms for data stream are mainly based on supervised learning, while manual labeling instances arriving continuously at a high speed requires much effort. A low-cost learning algorithm for stream data classification named 2SDC is proposed to solve the problem mentioned above. With few labeled instances and a large number of unlabeled instances, 2SDC trains the classification model and then updates it. The proposed algorithm can also detect the potential concept drift of the data stream and adjust the classification model to the current concept. Experimental results show that the accuracy of 2SDC is comparable to that of state-of-the-art supervised algorithm.
作者 李南
出处 《计算机系统应用》 2016年第12期187-192,共6页 Computer Systems & Applications
基金 福建省自然科学基金(2013J01216 2016J01280)
关键词 概念漂移 数据流 分类 低代价 监督学习 concept drift data stream classification low-cost supervised learning
  • 相关文献

参考文献3

二级参考文献51

  • 1Folino G, Pizzuti C, Spezzano G. An adaptive distributed ensemble approach to mine concept-drifting data streams [C]//Proc of the 19th IEEE Int Conf on Tools with Artificial Intelligence. Piseataway, NJ: IEEE, 2007:183-188.
  • 2Wang Haixun, Fan Wei, Yu P S, et al. Mining concept- drifting data streams using ensemble elassifiers[C] //Proe of the 9th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2003:226-235.
  • 3Tsymbal A. The problem of concept drift: Definitions and related work, TCD-CS-2004-15 [R]. Dublin, Ireland.. Department of Computer Science, Trinity College, 2004.
  • 4Hulten G, Spencer L, Domingos P. Mining time-changing data streams[C]//Proc of the 7th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2001:97-106.
  • 5Babcock B, Babu S, Datar M, et al. Models and issues in data stream systems[C] //Proc of the 21st ACM SIGACT- SIGMOD-SIGART Syrup on Principles of Database Systems. New York: ACM, 2002:1-16.
  • 6Widmer G, Kubat M. Learning in the presence of concept drift and hidden contexts[J]. Machine Learning, 1996, 23 (1) : 69-101.
  • 7Domingos P, Hulten G. Mining high-speed data streams[C] //Proc of the 6th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2000:71-80.
  • 8Gama J, Rocha R, Medas P. Accurate decision trees for mining high-speed data streams[C] //Proc of the 9th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2003:523-528.
  • 9Gama J, Medas P, Rocha R. Forest trees for on-line data[C] //Proc of the 19th ACM Symp on Applied Computing. New York: ACM, 2004:632-636.
  • 10Gama J, Castillo G. Learning with local drift detection[G]// LNAI 4093: Proe of the 2nd Inf Conf on Advanced Data Mining and Applieations. Berlin: Springer, 2006:42-55.

共引文献32

同被引文献7

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部