期刊文献+

不平衡数据分类方法及其在入侵检测中的应用研究 被引量:8

Imbalanced Data Classification Method and its Application Research for Intrusion Detection
下载PDF
导出
摘要 直接将传统的分类方法应用于不平衡数据集时,往往导致少数类的分类精度低下。提出一种基于K-S统计的不平衡数据分类方法,以有效提高少数类的识别率。利用K-S统计评估分类与特征之间的关系,去除冗余特征,并且构建K-S决策树获得数据分片,调整数据的不平衡度;最后对分片数据双向抽样调整,进行分类学习。该方法使用的K-S统计假设条件极易满足,其效率高且适用性强。通过KDD99入侵检测数据的分析对比表明,对于不平衡的数据集,该方法对多数类及少数类都具有较高的分类精度。 The traditional classification algorithms always have low classification accuracy rate especially for the minorityclass when they are directly employed on classifying imbalanced datasets.A K-S statistic based new classification method for imbalanced data was proposed to enhance the performance of minority class recognition.At first,the K-S statistic was employed as a correlation measure to remove redundant variables.Then a K-S based decision tree was built to segment the training data into several subsets.Finally,two-way resampling methods,forward and backward,were used to rebuild the segmentation datasets as to implement more reasonable classification learning.The proposed K-S based method,with a realistic assumption,is very high efficient and widely applicable.The KDD99 intrusion detection experimental analysis proves that the method has high classification accuracy rate of both minority and majority class for imbalanced datasets.
出处 《计算机科学》 CSCD 北大核心 2013年第4期131-135,共5页 Computer Science
基金 国家自然科学基金(61103044) 浙江省自然科学基金(Y1110567) 浙江省科技厅计划项目(2010C31126 2011C21046)资助
关键词 不平衡数据 K-S统计 逻辑回归 入侵检测 Imbalanced data K-S statistic Logistic regression Intrusion detection
  • 相关文献

参考文献15

  • 1Ling C X, Li C. Data mining for direct marketing:Problems and solutions[C]//Proceedings of the 4th international conference on knowledge discovery and data mining. New York, NY, 1998: 73-79.
  • 2Sun Yan-min, Kamel M S, Wong A K C, et aL Cost-Sensitive Boosting for Classification of Imbalanced Data[J]. Pattern Re- cognition, 2007,40(12) : 3358-3378.
  • 3Estabrooks A,Jo T,Japkowicz N. A multiple resampling method for learning from imbalanced data sets [J]. Computational Intel- ligence, 2004,20(1) : 18-36.
  • 4Japkowicz N, Stephen S. The class imbalance problem: A sys- tematic study[J]. Inte/ligent Data Analysis, 2002, 6 (5): 429- 450.
  • 5Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: Synthetic minority over-sampling techniques [J].Journal of Artificial Re- search, 2002,16 : 321-357.
  • 6Drummond C, Holte R C. C4. 5, Class imbalance, and cost sensi- tivity:Why under-sampling beats over-sampling [C] //Procee- dings of the ICML'03 Workshop on Learning from Irnbalanced Data Sets. 2003.
  • 7Kubat M,Matwin S. Addressing the curse of imbalanced train- ing sets:one-sided selection [C]//Proceedings of the 14th Inter- national Conference on Machine Learning. 1997:179-186.
  • 8Holte R C, Acker L E, Porter B W. Concept learning and the problem of small disj uncts[C]//Proceedings of the 11 th joint in- ternational conference on artificial intelligence. ]989:813-818.
  • 9Weiss G M. Mining with rarity: A unifying framework [J]. ACM SIGKDD Explorations Newsletter-Special Issue on Lear- ning from Imbalaneed Datasets, 2004,6 (1) : 7-19.
  • 10Quinlan J R. Improved estimates for the accuracy of small dis- juncts [J]. Machine Learning, 1991,6(1) : 93-98.

共引文献1

同被引文献58

  • 1张启蕊,张凌,董守斌,谭景华.训练集类别分布对文本分类的影响[J].清华大学学报(自然科学版),2005,45(S1):1802-1805. 被引量:27
  • 2肖雪,何中市.基于向量空间模型的中文文本层次分类方法研究[J].计算机应用,2006,26(5):1125-1126. 被引量:12
  • 3何琳,侯汉清,白振田,张雪英.基于标引经验和机器学习相结合的多层自动分类[J].情报学报,2006,25(6):725-729. 被引量:19
  • 4搜狗实验室.文本分类语料库[DB/OL].2008[2009-04-20].http://www.sogou.com/labs/dl/c.html.
  • 5张华平.ICTCLAS汉语分词系统[EB/OL].http://ictclas.org,检索日2008-07.
  • 6张清华,王国胤,胡军.多粒度知识获取与不确定性度量[M].北京:科学出版社,2013.
  • 7Wang J. An Extensive Study on Automated Dewey Decimal Classification [J]. Journal of the American Society for Information Science & Technology, 2009, 60(11): 2269-2286.
  • 8Garcia V, Alejo R, Sanchez J S, et al. Combined Effects of Class Imbalance and Class Overlap on Instance-Based Classification [A] // Intelligent Data Engineering and Automated Learning-IDEAL 2006 [M]. Berlin, Heidelberg: Springer, 2006: 371-378.
  • 9Orriols A, Bernado-Mansilla E. The Class Imbalance Problem in Learning Classifier Systems: A Preliminary Study [C]. In: Proceedings of the 2005 Workshops on Genetic and Evolutionary Computation. ACM, 2005: 74-78.
  • 10Japkowicz N, Stephen S. The Class Imbalance Problem: A Systematic Study [J]. Intelligent Data Analysis, 2002, 6(5): 429-449.

引证文献8

二级引证文献50

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部