The value difference metric (VDM) is one of the best-known and widely used distance functions for nominal attributes. This work applies the instance weighting technique to improve VDM. An instance weighted value dif...The value difference metric (VDM) is one of the best-known and widely used distance functions for nominal attributes. This work applies the instance weighting technique to improve VDM. An instance weighted value difference met- ric (IWVDM) is proposed here. Different from prior work, IWVDM uses naive Bayes (NB) to find weights for train- ing instances. Because early work has shown that there is a close relationship between VDM and NB, some work on NB can be applied to VDM. The weight of a training instance x, that belongs to the class c, is assigned according to the dif- ference between the estimated conditional probability P(c/x) by NB and the true conditional probability P(c/x), and the weight is adjusted iteratively. Compared with previous work, IWVDM has the advantage of reducing the time complex- ity of the process of finding weights, and simultaneously im- proving the performance of VDM. Experimental results on 36 UCI datasets validate the effectiveness of IWVDM.展开更多
A new method called RS-MSVM (Rough Set and Multi-class Support Vector Machine) is proposed for network intrusion detection. This method is based on rough set followed by MSVM for attribute reduction and classificati...A new method called RS-MSVM (Rough Set and Multi-class Support Vector Machine) is proposed for network intrusion detection. This method is based on rough set followed by MSVM for attribute reduction and classification respectively, The number of attributes of the network data used in this paper is reduced from 41 to 30 using rough set theory. The kernel function of HVDM-RBF (Heterogeneous Value Difference Metric Radial Basis Function), based on the heterogeneous value difference metric of heterogeneous datasets, is constructed for the heterogeneous network data. HVDM-RBF and one-against-one method are applied to build MSVM. DARPA (Defense Advanced Research Projects Agency) intrusion detection evaluating data were used in the experiment. The testing results show that our method outperforms other methods mentioned in this paper on six aspects: detection accuracy, number of support vectors, false positive rate, falsc negative rate, training time and testing time.展开更多
It is well-known that in order to build a strong ensemble, the component learners should be with high diversity as well as high accuracy. If perturbing the training set can cause significant changes in the component l...It is well-known that in order to build a strong ensemble, the component learners should be with high diversity as well as high accuracy. If perturbing the training set can cause significant changes in the component learners constructed, then Bagging can effectively improve accuracy. However, for stable learners such as nearest neighbor classifiers, perturbing the training set can hardly produce diverse component learners, therefore Bagging does not work well. This paper adapts Bagging to nearest neighbor classifiers through injecting randomness to distance metrics. In constructing the component learners, both the training set and the distance metric employed for identifying the neighbors are perturbed. A large scale empirical study reported in this paper shows that the proposed BagInRand algorithm can effectively improve the accuracy of nearest neighbor classifiers.展开更多
文摘The value difference metric (VDM) is one of the best-known and widely used distance functions for nominal attributes. This work applies the instance weighting technique to improve VDM. An instance weighted value difference met- ric (IWVDM) is proposed here. Different from prior work, IWVDM uses naive Bayes (NB) to find weights for train- ing instances. Because early work has shown that there is a close relationship between VDM and NB, some work on NB can be applied to VDM. The weight of a training instance x, that belongs to the class c, is assigned according to the dif- ference between the estimated conditional probability P(c/x) by NB and the true conditional probability P(c/x), and the weight is adjusted iteratively. Compared with previous work, IWVDM has the advantage of reducing the time complex- ity of the process of finding weights, and simultaneously im- proving the performance of VDM. Experimental results on 36 UCI datasets validate the effectiveness of IWVDM.
基金Supported by the 863 High Tech. Project (2001AA140213) and the State Key Basic Research Pro-ject (2001CB309403).
文摘A new method called RS-MSVM (Rough Set and Multi-class Support Vector Machine) is proposed for network intrusion detection. This method is based on rough set followed by MSVM for attribute reduction and classification respectively, The number of attributes of the network data used in this paper is reduced from 41 to 30 using rough set theory. The kernel function of HVDM-RBF (Heterogeneous Value Difference Metric Radial Basis Function), based on the heterogeneous value difference metric of heterogeneous datasets, is constructed for the heterogeneous network data. HVDM-RBF and one-against-one method are applied to build MSVM. DARPA (Defense Advanced Research Projects Agency) intrusion detection evaluating data were used in the experiment. The testing results show that our method outperforms other methods mentioned in this paper on six aspects: detection accuracy, number of support vectors, false positive rate, falsc negative rate, training time and testing time.
文摘It is well-known that in order to build a strong ensemble, the component learners should be with high diversity as well as high accuracy. If perturbing the training set can cause significant changes in the component learners constructed, then Bagging can effectively improve accuracy. However, for stable learners such as nearest neighbor classifiers, perturbing the training set can hardly produce diverse component learners, therefore Bagging does not work well. This paper adapts Bagging to nearest neighbor classifiers through injecting randomness to distance metrics. In constructing the component learners, both the training set and the distance metric employed for identifying the neighbors are perturbed. A large scale empirical study reported in this paper shows that the proposed BagInRand algorithm can effectively improve the accuracy of nearest neighbor classifiers.