期刊文献+

基于核密度估计的分布数据流离群点检测 被引量:8

Finding Outliers in Distributed Data Streams Based on Kernel Density Estimation
下载PDF
导出
摘要 基于数据流数据的挖掘算法研究受到了越来越多的重视.针对分布式数据流环境,提出基于核密度估计的分布数据流离群点检测算法.算法将各分布节点上的数据流作为全局数据流的子集,通过分布节点与中心节点的通信,维护基于全局数据流的分布密度估计.各分布节点基于该估计对其上的分布数据流进行离群点检测,从而得到基于全局数据流的离群点集合.对节点之间的交互以及离群点检测算法的细节进行了讨论.通过实验验证了算法的适用性和有效性. Recently, there has been occurring more and more applications based on data stream models. Data mining in data stream, such as clustering, classifying, etc, becomes a hot research field. This paper presents an algorithm for outlier detection in distributed data streams. The data stream on every distributed node is taken for a subset of the global data stream, which consists of data on all distributed nodes. Because of huge network traffic, it is impossible to send all data to a central node and do detection. Based on the communication of distribution information between distributed nodes and the central node, the algorithm maintains the density estimation for the union of all streams. On every distributed node, global outliers can be detected by the estimation. Details of communication schedule and outlier detection are also discussed in this paper. Experimental results show promising availabilities of the approach.
出处 《计算机研究与发展》 EI CSCD 北大核心 2005年第9期1498-1504,共7页 Journal of Computer Research and Development
基金 国家自然科学基金项目(70371015) 教育部高等学校博士学科点科研基金项目(20040286009)~~
关键词 分布数据流 离群点检测 核密度估计 distributed data streams outlier detection kernel density estimation
  • 相关文献

参考文献13

  • 1S. Muthukrishnan. Data streams algorithms and applications. In:Proc. the 14th Annual ACM-SIAM Symposium on Discrete Algorithms. Philadelphia: Society for Industrial and Applied Mathematics, 2003. 413~413.
  • 2金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 3D. Hawkins. Identification of Outliers. London: Chapman and Hall, 1980.
  • 4E.M. Knorr, R. T. Ng. Algorithms for mining distance-based outliers in large datasets. In: Proc. the 24th Int'l Conf. Very Large Databases. New York: ACM Press, 1998. 392~403.
  • 5D. Yu, G. Sheikholeslami, A. Zhang. Findout: Finding outliers in very large datasets. Knowledge and Information Systems,2002, 4(4): 387~412.
  • 6M. M. Breunig, H. Kriegel, R. T. Ng, et al. LOF:identifying density-based local outliers. In: Proc. the 2000 ACM SIGMOD Int'l Conf. Management of Data. New York: ACM Press, 2000. 93~104.
  • 7S. Papadimitirou, H. Kitagawa, P. B. Gibbons, et al. LOCI:Fast outlier detection using the local correlation integral. In: Proc.the 19th Int'l Conf. Data Engineering. Los Alamitos, CA: IEEE Computer Society Press, 2003. 315~326.
  • 8S. Muthukrishnan, R. Shah, J. Vitter. Mining deviants in time series data streams. In: Proc. the 16th Int'l Conf. Scientific and Statistical Database Management. Los Alamitos, CA: IEEE Computer Society Press, 2004. 41~50.
  • 9H. V. Jagadish, N. Koudas, S. Muthukrishnan. Mining deviants in a time series database. In: Proc. the 25th Int'l Conf.Very Large Data Bases. San Francisco: Morgan Kaufmann,1999. 102~113.
  • 10T. Palpanas, D. Papadopoulos, V. Kalogeraki, et al.Distributed deviation detection in sensor networks. SIGMOD Record, 2003, 32(4): 77~82.

二级参考文献52

  • 1Babcock B, Babu S, Datar M, Motwani R, Widom J. Models and issues in data streams. In: Popa L, ed. Proc. of the 21st ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems. Madison: ACM Press, 2002. 1~16.
  • 2Terry D, Goldberg D, Nichols D, Oki B. Continuous queries over append-only databases. SIGMOD Record, 1992,21(2):321-330.
  • 3Avnur R, Hellerstein J. Eddies: Continuously adaptive query processing. In: Chen W, Naughton JF, Bernstein PA, eds. Proc. of the 2000 ACM SIGMOD Int'l Conf. on Management of Data. Dallas: ACM Press, 2000. 261~272.
  • 4Hellerstein J, Franklin M, Chandrasekaran S, Deshpande A, Hildrum K, Madden S, Raman V, Shah MA. Adaptive query processing: Technology in evolution. IEEE Data Engineering Bulletin, 2000,23(2):7-18.
  • 5Carney D, Cetinternel U, Cherniack M, Convey C, Lee S, Seidman G, Stonebraker M, Tatbul N, Zdonik S. Monitoring streams?A new class of DBMS applications. Technical Report, CS-02-01, Providence: Department of Computer Science, Brown University, 2002.
  • 6Guha S, Mishra N, Motwani R, O'Callaghan L. Clustering data streams. In: Blum A, ed. The 41st Annual Symp. on Foundations of Computer Science, FOCS 2000. Redondo Beach: IEEE Computer Society, 2000. 359-366.
  • 7Domingos P, Hulten G. Mining high-speed data streams. In: Ramakrishnan R, Stolfo S, Pregibon D, eds. Proc. of the 6th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. Boston: ACM Press, 2000. 71-80.
  • 8Domingos P, Hulten G, Spencer L. Mining time-changing data streams. In: Provost F, Srikant R, eds. Proc. of the 7th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. San Francisco: ACM Press, 2001. 97~106.
  • 9Zhou A, Cai Z, Wei L, Qian W. M-Kernel merging: Towards density estimation over data streams. In: Cha SK, Yoshikawa M, eds. The 8th Int'l Conf. on Database Systems for Advanced Applications (DASFAA 2003). Kyoto: IEEE Computer Society, 2003. 285~292.
  • 10Gibbons PB, Matias Y. Synopsis data structures for massive data sets. In: Tarjan RE, Warnow T, eds. Proc. of the 10th Annual ACM-SIAM Symp. on Discrete Algorithms. Baltimore: ACM/SIAM, 1999. 909-910.

共引文献160

同被引文献80

引证文献8

二级引证文献62

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部