针对异构数据集下的不均衡分类问题,从数据集重采样、集成学习算法和构建弱分类器3个角度出发,提出一种针对异构不均衡数据集的分类方法HVDM-Adaboost-KNN算法(heterogeneous value difference metric-Adaboost-KNN),该算法首先通过聚...针对异构数据集下的不均衡分类问题,从数据集重采样、集成学习算法和构建弱分类器3个角度出发,提出一种针对异构不均衡数据集的分类方法HVDM-Adaboost-KNN算法(heterogeneous value difference metric-Adaboost-KNN),该算法首先通过聚类算法对数据集进行均衡处理,获得多个均衡的数据子集,并构建多个子分类器,采用异构距离计算异构数据集中2个样本之间的距离,提高KNN算法的分类准性能,然后用Adaboost算法进行迭代获得最终分类器。用8组UCI数据集来评估算法在不均衡数据集下的分类性能,Adaboost实验结果表明,相比Adaboost等算法,F1值、AUC、G-mean等指标在异构不均衡数据集上的分类性能都有相应的提高。展开更多
A new method called RS-MSVM (Rough Set and Multi-class Support Vector Machine) is proposed for network intrusion detection. This method is based on rough set followed by MSVM for attribute reduction and classificati...A new method called RS-MSVM (Rough Set and Multi-class Support Vector Machine) is proposed for network intrusion detection. This method is based on rough set followed by MSVM for attribute reduction and classification respectively, The number of attributes of the network data used in this paper is reduced from 41 to 30 using rough set theory. The kernel function of HVDM-RBF (Heterogeneous Value Difference Metric Radial Basis Function), based on the heterogeneous value difference metric of heterogeneous datasets, is constructed for the heterogeneous network data. HVDM-RBF and one-against-one method are applied to build MSVM. DARPA (Defense Advanced Research Projects Agency) intrusion detection evaluating data were used in the experiment. The testing results show that our method outperforms other methods mentioned in this paper on six aspects: detection accuracy, number of support vectors, false positive rate, falsc negative rate, training time and testing time.展开更多
文摘针对异构数据集下的不均衡分类问题,从数据集重采样、集成学习算法和构建弱分类器3个角度出发,提出一种针对异构不均衡数据集的分类方法HVDM-Adaboost-KNN算法(heterogeneous value difference metric-Adaboost-KNN),该算法首先通过聚类算法对数据集进行均衡处理,获得多个均衡的数据子集,并构建多个子分类器,采用异构距离计算异构数据集中2个样本之间的距离,提高KNN算法的分类准性能,然后用Adaboost算法进行迭代获得最终分类器。用8组UCI数据集来评估算法在不均衡数据集下的分类性能,Adaboost实验结果表明,相比Adaboost等算法,F1值、AUC、G-mean等指标在异构不均衡数据集上的分类性能都有相应的提高。
基金Supported by the 863 High Tech. Project (2001AA140213) and the State Key Basic Research Pro-ject (2001CB309403).
文摘A new method called RS-MSVM (Rough Set and Multi-class Support Vector Machine) is proposed for network intrusion detection. This method is based on rough set followed by MSVM for attribute reduction and classification respectively, The number of attributes of the network data used in this paper is reduced from 41 to 30 using rough set theory. The kernel function of HVDM-RBF (Heterogeneous Value Difference Metric Radial Basis Function), based on the heterogeneous value difference metric of heterogeneous datasets, is constructed for the heterogeneous network data. HVDM-RBF and one-against-one method are applied to build MSVM. DARPA (Defense Advanced Research Projects Agency) intrusion detection evaluating data were used in the experiment. The testing results show that our method outperforms other methods mentioned in this paper on six aspects: detection accuracy, number of support vectors, false positive rate, falsc negative rate, training time and testing time.