摘要
异常检测是数据挖掘领域研究的最基本的问题之一,它在欺诈甄别、气象预报、客户分类和入侵检测等方面有广泛的应用。针对网络入侵检测的需求提出了一种新的基于混合属性聚类的异常挖掘算法,并且依据异常点(outliers)是数据集中的稀有点这一本质,给出了一种新的数据相似性和异常度的定义。本文所提出算法具有线性时间复杂度,在KDDCUP99和WisconsinPrognosisBreastCancer数据集上的实验表明,算本法在提供了近似线性时间复杂度和很好的可扩展性的同时,能够较好的发现数据集中的异常点。
The outlier detection problem has important applications in the fields of fraud detection, weather prediction, customer segmentation1 and intrusion detection. Many recent algorithms use concepts of proximity in order to find outliers based on their relationship to the rest of the data. In this paper we proposed a new algorithm to detect outlier in high dimensional domains with mixed attributes based on clustering, and proposed a new method to measure similarity and outlyingness of objects. The algorithm we proposed can give near linear performance. The experimental results on KDDCUP99 and Wisconsin Breast Cancer dataset show that our algorithm is not only effective and scalable but also leads to reasonable good accuracy.
出处
《计算机应用》
CSCD
北大核心
2005年第6期1353-1356,共4页
journal of Computer Applications
基金
国家自然科学基金资助项目(60273075)
关键词
异常检测
聚类
数据挖掘
outlier detection
clustering
data ming