摘要
基于滑动窗口的异常检测是数据流挖掘研究的一个重要课题,在许多应用中数据流通常在一个分布网络上传输,解决这类问题时常采用分布计算技术,以便获得实时高质量的计算结果。对分布演化数据流上连续异常检测问题,进行形式化地阐述,提出了两个基于核密度估计的异常检测定义和算法,并通过大量真实数据集的实验,表明该算法具有良好的高效性和可扩展性,完全适应数据流应用的需求。
Anomaly detection based on sliding window is a focus problem in data streams research.But in many cases,stream data are often transmitted over a distributed network,we must perform distributed computations to guarantee high quality results in real-time even as new data arrive.This paper firstly formalizes the problem of continuous outlier detection over distributed evolving data streams.Then two outlier measures and algorithms based on kernel density estimator are proposed which can identify outliers in a single pass.Furthermore,the experiments with synthetic datasets show that the proposed methods are both efficient and effective compared with existing outlier detection algorithms,and more suitable for data stream.
出处
《计算机工程与应用》
CSCD
北大核心
2008年第7期174-178,共5页
Computer Engineering and Applications
基金
国家高技术研究发展计划(863)(the National High-Tech Research and Development Plan of China under Grant No.2004AA112020)
国家重点基础研究发展规划(973)(the National Grand Fundamental Research 973 Program of China under Grant No.2005CB321804)
湖南省自然科学基金(the Natural Science Foundation of Hunan Province of China under Grant No.03JJY6023)
湖南省重点科技攻关项目(No.05GK2002)
长沙市科技攻关重点项目(No.K06070001-12)
关键词
演化数据流
核密度估计
数据挖掘
异常检测
滑动窗口
evolving data streams
kernel density estimator
data mining
outlier detection
sliding window