摘要
提出了一种新的距离和对象异常因子的定义,在此基础上提出了一种两阶段异常检测方法TOD,第一阶段利用一种新的聚类算法对数据进行聚类,第二阶段利用对象的异常因子检测异常.TOD的时间复杂度与数据集大小成线性关系,与属性个数成近似线性关系,算法具有好的扩展性,适合于大规模数据集.理论分析和实验结果表明TOD具有稳健性和实用性.
In this paper, a new distance definition and outlier factor of object are introduced. On the basis of these, a two-stage outlier detection approach, named [WTBXTOD[WTBZ, is presented, the first stage cluster data by a new clustering method, the second stage identify outliers by the outlier factor of objects. The time complexity of [WTBXTOD[WTBZ is linear with the size of dataset and nearly linear with the number of attributes, which results in good scalability and adapts to large dataset. The theoretic analysis and the experimental results show that the [WTBXTOD[WTBZ is robust and practicable.
出处
《小型微型计算机系统》
CSCD
北大核心
2005年第7期1237-1240,共4页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(60273075)资助
关键词
聚类
异常因子
异常检测
clustering
outlier factor
outlier detection