期刊文献+

基于加权Bayes分类器的流数据在线分类算法研究 被引量:3

Weighted Bayes Based Data Streaming Online Classification Algorithm
下载PDF
导出
摘要 传统的分类算法在对模型进行训练之前,需要得到整个训练数据集。然而在大数据环境下,数据以数据流的形式源源不断地流向系统,因此不可能预先获得整个训练数据集。研究了大数据环境下含有噪音的流数据的在线分类问题。将流数据的在线分类描述成一个优化问题,提出了一种加权的Nave Bayes分类器和一种误差敏感的(Error Adaptive)分类器,并通过真实的数据集对提出的算法进行了验证。实验结果表明,文中提出的误差敏感的分类器算法在系统没有噪音的情况下分类预测的准确性要优于相关的算法;此外,当流数据中含有噪音时,误差敏感的分类器算法对噪音不敏感,仍然具有很好的预测准确性,因此可以应用于大数据环境下流数据的在线分类预测。 Traditional classification algorithms need to obtain the whole training dataset before training the model.However,for big data,data are streaming into the system sequentially,so it is impossible to obtain the whole training dataset beforehand.This paper studied the online classification problem in data streaming for big data.It first described the online classification problem as an optimization problem,then proposed a Weighted Naive Bayes classifier and an Error Adaptive classifier,and at last,validated the efficiency of the proposed algorithm according to two real datasets.The experiments show that the prediction accuracy of our proposed algorithm is higher than related researches in non-noisy data streaming,and moreover,while data streaming is noisy,our algorithm still has better prediction accuracy,so it can be used in real online classification application in data streaming.
作者 卢惠林
出处 《计算机科学》 CSCD 北大核心 2014年第5期227-229,234,共4页 Computer Science
基金 国家自然科学基金(61170121)资助
关键词 大数据 决策树 分类算法 流数据 Big data Decision tree Classification algorithm Data streaming
  • 相关文献

参考文献19

  • 1Domingos P,Hulten G.Mining high-speed data streams[C]//Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery And Data Mining.ACM,2000:71-80.
  • 2Yang H,Fong S.Moderated VFDT in stream mining using adaptive tie threshold and incremental pruning[M].//Data Warehousing and Knowledge Discovery.Springer,2011:471-483.
  • 3Hulten G,Spencer L,Domingos P.Mining time-changing data streams[C]//Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery And Data Mining 2001:97-106.
  • 4Li W,Han J,Pei J.CMAR:Accurate and efficient classification based on multiple class-association rules[C]//IEEE International Conference on Data Mining.ACM,2001:369-376.
  • 5Han J.CPAR:Classification based on predictive association rules[OL].http://sci2s.ugr.es/keel/pdf/algorithm/congreso/2003-Yin-CPAR.pdf,2003.
  • 6Thabtah F,Cowling P,Peng Y.MCAR:multi-class classification based on association rule[C]//The 3rd ACS/IEEE InternationalConference on Computer Systems and Applications.IEEE,2005.
  • 7詹英,吴春明,王宝军.一种与缓冲区紧耦合的环形循环滑动窗口的数据流抽取算法[J].电子学报,2011,39(4):894-898. 被引量:10
  • 8崔贯勋,李梁,王柯柯,苟光磊,邹航.关联规则挖掘中Apriori算法的研究与改进[J].计算机应用,2010,30(11):2952-2955. 被引量:94
  • 9詹英,吴春明,王宝军.基于RCSW的数据流速度异常检测算法研究[J].电子学报,2012,40(4):674-680. 被引量:2
  • 10吴枫,仲妍,吴泉源.基于增量核主成分分析的数据流在线分类框架[J].自动化学报,2010,36(4):534-542. 被引量:12

二级参考文献56

共引文献120

同被引文献35

  • 1鲁明羽.Bayes文本分类器的改进方法研究[J].计算机工程,2006,32(17):63-65. 被引量:11
  • 2Tsang S,Kao B,Yip K Y ,et al. Decision trees for uncertaindata[J]. Knowledge &Data Engineering IEEE Transactions,2 0 0 9 ,2 3 (1 ):64 -7 8.
  • 3Hulten G , Spencer L , Domingos P. Mining time changingdata stre a m s[C]// Process of the Seventh ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining. [S .l.] : ACM, 2001:97 -106.
  • 4Qin B , Xia Y , Li F. DTU: a decision tree for uncertain data[J]. Advances in Knowledge Discovery and Data Mining,2009,5476:4 -1 5 .
  • 5Gao C , Wang J. Direct mining of discriminative patterns forclassifying uncertain data [C] / / Proceedings of the 16thACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining. [S. l.] ACM,2010:861 -870.
  • 6Cao K Y , Wang G , Han D. An algorithm for classificationover uncertain data based on extreme learning machine[J].Neurocomputing,2016,174:194 -202.
  • 7Liang C , Zhang Y , Shi P , et al. Learning very fast decisiontree from uncertain data streams with positive and unlabeledsamples [J]. Information Sciences, 2012, 213 ( 23 ):50 -6 7 .
  • 8Pan S, Wu K , Zhang Y , et al. Classifier ensemble foruncertain data stream classification [J]. Lecture Notes inComputer Science,2010,6118( 1) :488 -495.
  • 9Hoeffding W. Probability inequalities for sums of boundedrandom variables [J]. Journal o f the American StatisticalAssociation,1962,5 ( ( 301): 13 -3 0 .
  • 10He J , Zhang Y , Shi X L P. Learning naive Bayes classifiersfrom positive and unlabelled examples with uncertainty[J].International Journal o f Systems Science, 2012, 43 ( 10 ) :1805 -1825.

引证文献3

二级引证文献41

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部