针对异构数据集下的不均衡分类问题,从数据集重采样、集成学习算法和构建弱分类器3个角度出发,提出一种针对异构不均衡数据集的分类方法HVDM-Adaboost-KNN算法(heterogeneous value difference metric-Adaboost-KNN),该算法首先通过聚...针对异构数据集下的不均衡分类问题,从数据集重采样、集成学习算法和构建弱分类器3个角度出发,提出一种针对异构不均衡数据集的分类方法HVDM-Adaboost-KNN算法(heterogeneous value difference metric-Adaboost-KNN),该算法首先通过聚类算法对数据集进行均衡处理,获得多个均衡的数据子集,并构建多个子分类器,采用异构距离计算异构数据集中2个样本之间的距离,提高KNN算法的分类准性能,然后用Adaboost算法进行迭代获得最终分类器。用8组UCI数据集来评估算法在不均衡数据集下的分类性能,Adaboost实验结果表明,相比Adaboost等算法,F1值、AUC、G-mean等指标在异构不均衡数据集上的分类性能都有相应的提高。展开更多
A new method called RS-MSVM (Rough Set and Multi-class Support Vector Machine) is pro-posed for network intrusion detection. This method is based on rough set followed by MSVM for attribute re-duction and classificati...A new method called RS-MSVM (Rough Set and Multi-class Support Vector Machine) is pro-posed for network intrusion detection. This method is based on rough set followed by MSVM for attribute re-duction and classification respectively. The number of attributes of the network data used in this paper is re-duced from 41 to 30 using rough set theory. The kernel function of HVDM-RBF (Heterogeneous Value Dif-ference Metric Radial Basis Function), based on the heterogeneous value difference metric of heterogeneous datasets, is constructed for the heterogeneous network data. HVDM-RBF and one-against-one method are ap-plied to build MSVM. DARPA (Defense Advanced Research Projects Agency) intrusion detection evaluating data were used in the experiment. The testing results show that our method outperforms other methods men-tioned in this paper on six aspects: detection accuracy, number of support vectors, false positive rate, false negative rate, training time and testing time.展开更多
文摘针对异构数据集下的不均衡分类问题,从数据集重采样、集成学习算法和构建弱分类器3个角度出发,提出一种针对异构不均衡数据集的分类方法HVDM-Adaboost-KNN算法(heterogeneous value difference metric-Adaboost-KNN),该算法首先通过聚类算法对数据集进行均衡处理,获得多个均衡的数据子集,并构建多个子分类器,采用异构距离计算异构数据集中2个样本之间的距离,提高KNN算法的分类准性能,然后用Adaboost算法进行迭代获得最终分类器。用8组UCI数据集来评估算法在不均衡数据集下的分类性能,Adaboost实验结果表明,相比Adaboost等算法,F1值、AUC、G-mean等指标在异构不均衡数据集上的分类性能都有相应的提高。
基金Supported by the 863 High Tech. Project (2001AA140213) and the State Key Basic Research Pro-ject (2001CB309403).
文摘A new method called RS-MSVM (Rough Set and Multi-class Support Vector Machine) is pro-posed for network intrusion detection. This method is based on rough set followed by MSVM for attribute re-duction and classification respectively. The number of attributes of the network data used in this paper is re-duced from 41 to 30 using rough set theory. The kernel function of HVDM-RBF (Heterogeneous Value Dif-ference Metric Radial Basis Function), based on the heterogeneous value difference metric of heterogeneous datasets, is constructed for the heterogeneous network data. HVDM-RBF and one-against-one method are ap-plied to build MSVM. DARPA (Defense Advanced Research Projects Agency) intrusion detection evaluating data were used in the experiment. The testing results show that our method outperforms other methods men-tioned in this paper on six aspects: detection accuracy, number of support vectors, false positive rate, false negative rate, training time and testing time.