摘要
聚类算法广泛应用于入侵检测系统(IDS)的数据挖掘中。虽然K-MEANS算法是最为经典的聚类算法之一,但是由于入侵检测系统的数据集具有特殊性,直接在其上进行K-MEANS聚类的效果不佳。为了提高K-MEANS在IDS数据集上的聚类准确性,引入一种数据预处理方法。该方法对IDS的记录特征做标准化处理,使原本取值范围差异很大的数值型特征在同一个区间内取值,排除原始数据中不同度量带来的不良影响,从而优化聚类的效果。仿真实验表明,K-MEANS算法对预处理后的IDS数据集的聚类准确度有很大的提高。
Clustering algorithms are widely used in intrusion detection system(IDS) to mine the data.Although K-MEANS is one of the most classical clustering algorithms,the effect is not very good when it is used in IDS directly.The reason is that the data set of intrusion detection system is peculiar.In order to improve the clustering accuracy of K-MEANS on IDS data set,designs a data preprocessing method,which makes the features of IDS record standardized,and makes all features with very different value ranged in the same range.This can exclude the impact of difference between the measured variables of the original data,and can help to improve the effect of clustering.Simulation results show that the clustering accuracy of K-MEANS on the preprocessed IDS data set has been greatly improved.
出处
《计算机技术与发展》
2010年第7期129-131,F0003,共4页
Computer Technology and Development
基金
国家自然科学基金(60863001)
江苏省高校自然科学基础研究项目(08KJB620002)
南京邮电大学校科研基金(NY207051)
关键词
数据挖掘
入侵检测系统
K均值聚类
预处理
data mining
intrusion detection system
K-MEANS clustering
preprocessing