基于累积正样本的偏斜数据流集成分类方法

Classifier Ensemble for Imbalanced Data Stream Classification Based on Accumulated Minorities

下载PDF

导出

摘要针对现有处理偏斜数据流的方法存在过拟合或者未充分利用现有数据这一问题,提出一种基于累积正样本的偏斜数据流集成分类方法 EAMIDS。该算法把目前达到的所有数据块的正样本收集起来生成集合AP,然后采用KNN算法和Over-sampling方法来平衡数据块的类分布。当基分类器数量超过最大值时,根据F-Measure值来更新集成分类器。通过在模拟数据集SEA和SPH上的实验,与IDSL算法和SMOTE算法相比,表明EAMIDS具有更高的准确率。 To solve the issue of over-fitting and not making full use of current data in existing methods of balancing imbalanced data stream,a method named EAMIDS for imbalanced data stream is proposed based on accumulated positive samples. In EAMIDS,positive samples from previous training chunks are accumulated to form the AP set which is used to balance the class distributions by making use of K nearest neighbors and Over-sampling technique. The ensemble classifier will be updated according to F-Measure when the number of the available base classifiers is greater than the fixed size of the ensemble classifier. Empirical study on both SEA dataset and SPH dataset shows that the proposed EAMIDS has substantial advantage over IDSL approach and SMOTE approach in prediction accuracy.

作者郭文锋王勇

机构地区西北工业大学计算机学院西北工业大学理学院

出处《计算机与现代化》 2015年第3期41-47,共7页 Computer and Modernization

基金西北工业大学基础研究基金资助项目(JC201273)

关键词偏斜数据流累积正样本集成分类器概念漂移 imbalanced data streams accumulated positive samples ensemble classifiers concept drift

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1欧阳震诤,罗建书,胡东敏,吴泉源.一种不平衡数据流集成分类模型[J].电子学报,2010,38(1):184-189. 被引量：23

二级参考文献20

1H Wang, et al. Mining concept-drifting data streams using ensemble classifiers[ A ]. Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining[C] .New York: ACM Press,2003.226- 235.
2M Scholz, R Klinkenberg. An ensemble classifier for drifting concepts[ A]. Proceedings of the Second International Work- shop on Knowledge Discovery in Data Streams [ C]. Porto, Portugal: Springer,2005.53 - 64.
3Wei Fan. Systematic data selection to mine concept - drifting data streams[A]. Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining[C] .New York: ACM Press,2004. 128- 137.
4J Z Kolter, M A Maloof. Using additive expert ensembles to cope with concept drift [ A]. Proceedings of the 22nd International Conference on Machine Learning[C]. New York: ACM Press, 2005.449 - 456.
5G M Weiss, F Provost. Learning when training data are costly: the effect of class distribution on tree induction[ J]. JOUlllal of Artificial Intelligence Research, 2003, (19) : 315 - 354.
6N V Chawla, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, (16) :321 - 357.
7G M Weiss. Mining with rarity: a unifying framework[ J]. ACM SIGKDD Explorations, 2004,6( 1 ) :8 - 19.
8C Elkan. The foundations of cost - sensitive learning[A]. Proceedings of the 17th International Joint Conference on Artificial Intelligence[C]. Seattle, Washington, USA: Morgan Kaufinann Publishers Inc, 2001. 973 - 978.
9M Ciraco, M Rogalewski, G Weiss. Improving classifier utility by altering the misclassification cost ratio[A]. Proceedings of the 1st International Workshop on Utility-based Data Mining [C] .New York: ACM Press,2005.46- 52.
10C X Ling, V S Sheng. Cost-sensitive learning and the class imbalance problem [ A ]. Encyclopedia of Machine Learning M]. New York: Springer. 2008.

共引文献22

1张健沛,杨显飞,杨静.交叉验证容噪分类算法有效性分析及其在数据流上的应用[J].电子学报,2011,39(2):378-382. 被引量：3
2李南,郭躬德.面向高速数据流的集成分类器算法[J].计算机应用,2012,32(3):629-633. 被引量：4
3于重重,田蕊,谭励,涂序彦.非平衡样本分类的集成迁移学习算法[J].电子学报,2012,40(7):1358-1363. 被引量：27
4李南,郭躬德,陈黎飞.基于少量类标签的概念漂移检测算法[J].计算机应用,2012,32(8):2176-2181. 被引量：7
5张伶卫,万文强.基于云计算平台的代价敏感集成学习算法研究[J].山东大学学报（工学版）,2012,42(4):19-23. 被引量：3
6刘余霞,吕虹,刘三民.一种基于分类器相似性集成的数据流分类研究[J].计算机科学,2012,39(12):208-210. 被引量：2
7王佰玲,曲芸,张永铮,田志宏.基于数据流的网页内容分析技术研究[J].电子学报,2013,41(4):751-756. 被引量：4
8张玉红,胡学钢,张娟.倾斜数据流中正例样本的漂移检测方法[J].计算机科学与探索,2013,7(6):545-550.
9冯林,姚远,陈沣,金博.一种基于MapReduce的动态数据流分类算法[J].大连理工大学学报,2014,54(4):461-468.
10翟云,王树鹏,马楠,杨炳儒,张德政.基于单边选择链和样本分布密度融合机制的非平衡数据挖掘方法[J].电子学报,2014,42(7):1311-1319. 被引量：18

1宋群,张骏,邓正宏.基于偏斜数据流分类的入侵检测方法[J].西北工业大学学报,2009,27(6):859-862. 被引量：1
2王志军.将Windows7更新集成到安装光盘中[J].电脑知识与技术（经验技巧）,2013(6):29-31.
3黄永毅.一种不平衡数据支持向量机分类算法[J].硅谷,2013,6(12):34-35. 被引量：1
4宋群,张骏,智永锋.基于集成PU学习数据流分类的入侵检测方法[J].微电子学与计算机,2013,30(7):173-176.
5曾志强,吴群,廖备水,高济.一种基于核SMOTE的非平衡数据集分类方法[J].电子学报,2009,37(11):2489-2495. 被引量：49
6王超学,张涛,马春森.面向不平衡数据集的改进型SMOTE算法[J].计算机科学与探索,2014,8(6):727-734. 被引量：25
7空白情书.寻找本本的最佳替身[J].现代计算机（中旬刊）,2008(4):34-35.
8汤志亚,赵亮,杨玲,甄小琼,杨志鹏.一种基于改进BTS的多类非平衡分类的集成学习方法[J].商丘师范学院学报,2015,31(6):30-34.
9吴冲冲.基于集成学习的中文微博情感分类方法[J].科技传播,2014,6(16):235-236.
10朱明,陶新民.基于随机下采样和SMOTE的不均衡SVM分类算法[J].信息技术,2012,36(1):39-43. 被引量：13

计算机与现代化

2015年第3期

浏览历史

内容加载中请稍等...

基于累积正样本的偏斜数据流集成分类方法

参考文献1

二级参考文献20

共引文献22

相关作者

相关机构

相关主题

浏览历史