摘要
直接将传统的分类方法应用于不平衡数据集时,往往导致少数类的分类精度低下。提出一种基于K-S统计的不平衡数据分类方法,以有效提高少数类的识别率。利用K-S统计评估分类与特征之间的关系,去除冗余特征,并且构建K-S决策树获得数据分片,调整数据的不平衡度;最后对分片数据双向抽样调整,进行分类学习。该方法使用的K-S统计假设条件极易满足,其效率高且适用性强。通过KDD99入侵检测数据的分析对比表明,对于不平衡的数据集,该方法对多数类及少数类都具有较高的分类精度。
The traditional classification algorithms always have low classification accuracy rate especially for the minorityclass when they are directly employed on classifying imbalanced datasets.A K-S statistic based new classification method for imbalanced data was proposed to enhance the performance of minority class recognition.At first,the K-S statistic was employed as a correlation measure to remove redundant variables.Then a K-S based decision tree was built to segment the training data into several subsets.Finally,two-way resampling methods,forward and backward,were used to rebuild the segmentation datasets as to implement more reasonable classification learning.The proposed K-S based method,with a realistic assumption,is very high efficient and widely applicable.The KDD99 intrusion detection experimental analysis proves that the method has high classification accuracy rate of both minority and majority class for imbalanced datasets.
出处
《计算机科学》
CSCD
北大核心
2013年第4期131-135,共5页
Computer Science
基金
国家自然科学基金(61103044)
浙江省自然科学基金(Y1110567)
浙江省科技厅计划项目(2010C31126
2011C21046)资助
关键词
不平衡数据
K-S统计
逻辑回归
入侵检测
Imbalanced data
K-S statistic
Logistic regression
Intrusion detection