期刊文献+

分布式数据流分类关键技术研究 被引量:2

Research on the key technologies for classification of distributed data streams
下载PDF
导出
摘要 随着数据采集和生成技术的不断成熟,能够生成数据流的应用越来越多,近些年,网络应用进一步普及,单一数据流的应用向着多节点的分布式数据流方向转移,如传感器网络、网络监控、WEB日志以及多站点的信用卡交易数据。这些数据不仅具有实时、连续、规模大的特点,还具有分布式的特征,如何管理和分析大规模的分布式的动态数据集,是研究人员面临的重要课题。针对这种现状,本文给出了同构分布式数据流和异构分布式数据流的形式化描述,分析了集中式流处理架构与分布式流处理架构的优势与不足,讨论了分布式数据流分类算法的最新进展,归纳了分布式数据流挖掘面临的问题和挑战,以及未来可能的研究方向。 With advances in data collection and generation technologies, environments that produce data streams is more and more. In recent years, the network application is further universal and the applications of a single data stream transfer toward a multi -node distributed data streams, such as sensor network, network monitoring, web log analysis and the credit card transaction data of multiple sites. These data is not only real - time, continuous and large scale, but also distributed. How to manage and analyze large dynamic datasets is an important subject that researchers are faced with. In view of the situation, it presented the formalization description of homogeneous and heterogeneous distributed data stream in this paper, analyzed advantages and disadvantages of the centralized stream processing architecture and distributed streaming architecture, discussed the recent progress in distributed data stream classification algorithm, summed up the problems and challenges faced by the distributed data stream mining, and possible future research directions.
出处 《华北科技学院学报》 2015年第4期119-124,共6页 Journal of North China Institute of Science and Technology
基金 中央高校基本科研业务费资助(3142014096 3142014087 3142014125 3142013098)
关键词 分布式数据流 数据挖掘 分类 Distributed data streams Data mining Classification
  • 相关文献

参考文献4

二级参考文献151

  • 1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 2杨立,左春,王裕国.面向服务的知识发现体系结构研究与实现[J].计算机学报,2005,28(4):445-457. 被引量:16
  • 3李建中,郭龙江,张冬冬,王伟平.数据流上的预测聚集查询处理算法[J].软件学报,2005,16(7):1252-1261. 被引量:24
  • 4杨宜东,孙志挥,张净.基于核密度估计的分布数据流离群点检测[J].计算机研究与发展,2005,42(9):1498-1504. 被引量:9
  • 5钱江波,徐宏炳,董逸生,王永利,刘学军,杨雪梅.基于最小生成树的数据流窗口连接优化算法[J].计算机研究与发展,2007,44(6):1000-1007. 被引量:3
  • 6Plale B. Learning run time knowledge about event rates to im- prove memory utilization in wide area stream filtering[C]//Pro- ceedings of the International Symposium on High Performance Distributed Computing (HPDC). 2002~ 171-178.
  • 7Chen L, Reddy K, Agrawal G. GATES: A grid-based middleware for processing distributed data streams[C]//Proeeedings of the International Symposium on High Performance Distributed Computing (HPDC). 2004 : 270-277.
  • 8Chi Y, Yu P, Wang H, et al. Loadstar: A load shedding scheme for classifying data streams[C]//Proceedings of the SIAM In- ternational Conference on Data Mining (SDM). 2005:342-361.
  • 9Ghosting A, Buehrer G, Parthasarathy S, et al. A characteriza tion of data mining algorithms on a modern processor[C]//Pro- ceedings of the ACM SIGMOD Workshop on Data Management on New Hardware. 2005:1-5.
  • 10IMbeock B, tMbu S, Datar M, et al. Models and issues in data stream systems[C]//Proceedings of the Symposium on Princi- ples of Database Systems (PODS). 2002:1-16.

共引文献67

同被引文献12

引证文献2

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部