期刊文献+

分布式流数据频繁项发现算法的研究 被引量:1

Study on discovering frequent items algorithm for distributed data stream
下载PDF
导出
摘要 对分布式流数据中频繁项的发现算法进行了研究,利用一种新颖的分布式概要算法(DSA)来发现从叶子节点直至根节点的概要结构,通过在不同的分布状态下设置相应的精确梯度来最小化通信负载,并利用真实数据集验证了该结构和算法的有效性。 To study the algorithms for discovering the frequent items of distributed data streams, a novel algorithm was applied to find the synopsis structures from leaf nodes to root node by Distributed Synopsis Algorithm ( DSA), and minimize the communicated loads through the relevant diagrent under different conditions. The experiment verifies the efficiency of the algorithm and structure by real data sets.
作者 杨颖 杨磊
出处 《计算机应用》 CSCD 北大核心 2008年第1期136-139,共4页 journal of Computer Applications
基金 国家863计划项目(2002AA4Z3430) 广西自然科学基金资助项目(桂科基200731023) 广西教育厅桂教科研项目(200626)
关键词 流数据 频繁项 概要结构 data stream frequent item synopsis
  • 相关文献

参考文献6

  • 1BABCOCK B, OLSTON C. Distributed top-k monitoring[ C]//Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. New York: ACM Press, 2003:102 -114.
  • 2VITYER J S. Random sampling with a reservoir [ J]. ACM Transactions on Mathematical Software, 1985, 11 (1) : 37 - 57.
  • 3AGRAWAL R, SRIKANT R. Fast algorithms for mining association rules[C]//Proceedings of the Twentieth International Conference on Very Large Data Bases. Santiago: VLDB Press, 1994,77-89.
  • 4JIN C, QIAN W, SHA C, et al. Dynamically maintaining frequent items over a data stream [C]//Proceedings of the 2003 ACM CIKM International Conference on Information and Knowledge Management. New Orleans: ACM Press, 2003,225 - 237.
  • 5王鹏,吴晓晨,王晨,汪卫,施伯乐.CAPE——数据流上的基于频繁模式的分类算法[J].计算机研究与发展,2004,41(10):1677-1683. 被引量:7
  • 6WHANG K Y, VANDER-ZANDEN B T, TAYLOR H M. A linear-time probabilistic counting algorithm for database applications [ J]. ACM Transaction on Database Systems, 1990,15(2) : 208 -229.

二级参考文献11

  • 1J Han, M Kamber. Data Mining: Concepts and Techniques. San Francisco: Morgan Kaufmann, 2000
  • 2B Babcock, S Babu, M Datar, et al. Models and issues in data stream systems. In: Proc of ACM Symp on Principles of Database Systems (PODS-02). New York: ACM Press, 2002
  • 3Y Chen, G Dong, J Han,et al. Multi-dimensional regression analysis of time-series data streams. In: Proc of Very Large Database (VLDB02). San Francisco: Morgan Kaufmann, 2002
  • 4J-M Adamo. Data Mining for Association Rules and Sequential Patterns: Sequential and Parallel Algorithms. New York:Springer-Verlag, 2001
  • 5G Hulten, L Spencer, P Domingos. Mining time-changing data streams. In: Proc of the Int'l Conf on Knowledge Discovery and Data Mining (SIGKDD01). New York: ACM Press, 2001. 97~106
  • 6Haixun Wang, Wei Fan Philip S Yu, Jiawei Han. Mining concept-drifting data streams using ensemble classifiers. In: Proc of the Int'l Conf on Knowledge Discovery and Data Mining (SIGKDD03). New York: ACM Press, 2003
  • 7B Liu, W Hsu, Y Ma. Integrating classification and association rule mining. KDD'98, New York, 1998
  • 8W Li, J Han, J Pei. CMAR: Accurate and efficient classiffication based on multiple class-association rules. In: Proc of ICDM' 01.Washington, D C: IEEE Computer Society Press, 2001. 369~376
  • 9X Yin, J Han. CPAR: Classification based on predictive association rules. The 2003 SIAM Int'l Conf on Data Mining (SDM'03), San Fransisco, CA, 2003
  • 10Joong Hyuk Chang, Won Suk Lee. Finding recent frequent itemsets adaptively over online date streams. In: Proc of SIGKDD03. New York: ACM Press, 2003

共引文献6

同被引文献13

  • 1王伟平,李建中,张冬冬,郭龙江.一种有效的挖掘数据流近似频繁项算法[J].软件学报,2007,18(4):884-892. 被引量:33
  • 2MANKU G S , MOTWANI R. Approximate frequency counts over data streams[ C]// Proceedings of 28th International Conference on Very Large Data Bases. Hong Kong: Morgan Kaufmann, 2002:346 -357.
  • 3FLAJOLET P, MARTIN G N. Probabilistic counting algorithms for data base applications[ J]. Journal of Computer and System Sciences, 1985, 31(2) : 182 -209.
  • 4GIBBONS P B, MATIAS Y. New sampling-based summary statistics for improving approximate query answers[ C]// Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data. New York: ACM, 1998:331-342.
  • 5LIU HONGYAN, LU YING, HAN J W, et al. Error-adaptive and time-aware maintenance of frequency counts over data streams[ C]// Proceedings of WAIN 2006, LNCS 4016. Berlin: Springer-Verlag, 2006:484-495.
  • 6ESTAN C, VARGHESE G. New directions in traffic measurement and accounting: focusing on the elephants, ignoring the mice[ J]. ACM Transactions on Computer System, 2003, 21(3):270 -313.
  • 7JIN CHEQING, QIAN WEINING, SHA CHAOFENG, et al. Dynamically maintaining frequent items over a data stream[ C]//CARBONELL J, ed. Proceedings of the 2003 ACM CIKM International Conference on Information and Knowledge Management. New Orleans: ACM Press, 2003:287-294.
  • 8LIN B, HO W-S, KAO B, et al. Adaptive frequency counting over bursty data streams[ C]//Proceedings of the 2007 IEEE Symposium on Computational Intelligence and Data Mining. Washington, DC: IEEE Computer Society, 2007:516 - 523.
  • 9LI H F, LEE S Y. Mining frequent itemsets over data streams using efficient window sliding techniques[ J]. Expert Systems with Appli- cations, 2009, 36(2) : 1466 - 1477.
  • 10CALDERS T, DEXTERS T N, GOETHALS B. Mining frequent items in a stream using flexible windows[ J]. Intelligent Data Analysis, 2008, 12(3) : 293 -304.

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部