摘要
基于数据流数据的挖掘算法研究受到了越来越多的重视.针对分布式数据流环境,提出基于核密度估计的分布数据流离群点检测算法.算法将各分布节点上的数据流作为全局数据流的子集,通过分布节点与中心节点的通信,维护基于全局数据流的分布密度估计.各分布节点基于该估计对其上的分布数据流进行离群点检测,从而得到基于全局数据流的离群点集合.对节点之间的交互以及离群点检测算法的细节进行了讨论.通过实验验证了算法的适用性和有效性.
Recently, there has been occurring more and more applications based on data stream models. Data mining in data stream, such as clustering, classifying, etc, becomes a hot research field. This paper presents an algorithm for outlier detection in distributed data streams. The data stream on every distributed node is taken for a subset of the global data stream, which consists of data on all distributed nodes. Because of huge network traffic, it is impossible to send all data to a central node and do detection. Based on the communication of distribution information between distributed nodes and the central node, the algorithm maintains the density estimation for the union of all streams. On every distributed node, global outliers can be detected by the estimation. Details of communication schedule and outlier detection are also discussed in this paper. Experimental results show promising availabilities of the approach.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2005年第9期1498-1504,共7页
Journal of Computer Research and Development
基金
国家自然科学基金项目(70371015)
教育部高等学校博士学科点科研基金项目(20040286009)~~
关键词
分布数据流
离群点检测
核密度估计
distributed data streams
outlier detection
kernel density estimation