期刊文献+

一种面向数据流的频繁项集挖掘算法

An Algorithm for Mining Frequent Itemsets in Data Streams
下载PDF
导出
摘要 与传统静态数据库中的数据不同,数据流是一个按时间到达的有序的项集,这使得经典的频繁项集挖掘算法难以适用到数据流中.根据数据流的特点,提出了数据流频繁项集挖掘算法FP-SegCount.该算法将数据流分段并利用改进的FP-growth算法挖掘分段中的频繁项集.然后,利用Count Min Sketch进行项集计数.算法解决了压缩统计和计算快速高效的问题.通过和FP-DS算法的实验对比,FP-SegCount算法具有较好的时间效率. Different from data in traditional static database, a data stream is an ordered sequence of items that arrives in timely order. Classical frequent item - sets mining method is difficult to apply to data stream. Based on the characteristics of data streams, FP - SegCount algorithm is proposed in this paper to mine frequent item - sets from data streams. The algorithm partitions the data stream and uses modified FP - growth algorithm to mine frequent item- sets in every segment. It then counts item -sets in Count Min Sketch. This algorithm solves compressed statistics and ensures effective computation. Through experimentation and comparison with FP - DS algorithm, FP SegCount algorithm is shown to have a good time efficiency.
作者 孟彩霞
出处 《昆明理工大学学报(理工版)》 北大核心 2009年第5期26-30,35,共6页 Journal of Kunming University of Science and Technology(Natural Science Edition)
基金 国家自然科学基金(项目编号:60573096) 陕西省自然科学基金项目(项目编号:2004f283) 西安市科技创新支撑-应用发展研究计划项目(项目编号:YF07024)
关键词 数据流 数据挖掘 数据流挖掘 频繁项集 data stream data mining data stream mining frequent item -sets
  • 相关文献

参考文献11

  • 1AGRAWAL R, SRIKANT R. Fast algorithms for mining association rules [ C ]//Proc 20th International Conference on VLDB . Morgan Kaufmann, 1994 : 487 - 499.
  • 2AGRAWAL R, IMIELINSKI T, SWAMI A. Mining association rules between sets of items in large database [ C ]//Proc of the ACM SIGMOD Conf on Management of Data, 1993 : 207 - 216.
  • 3HAN J, PEI J, YIN Y. Mining frequent patterns without candidate generation [ C ]//Proc 2000 ACM - SIGMOD International Conference Management of Data ( SIGMOD'00 ) , Dalas, TX, 2000 - 05 : 1 - 12.
  • 4GURMEET S M, RAJEEV M. Approximate frequency counts over data streams [ C ]//Proc of the 28th VLDB Conference. Hong Kong, China,2002 : 346 - 357.
  • 5MOSES C, KEVIN C. MARTIN F C. Finding frequent items in data streams [ J ]. Theoretical Computer Science,2004,23 (2) : 312 -315.
  • 6GRAHAM C. MUTHUKRISHNAN S. An improved data stream summary : the count - min sketch and its applications [ J ]. Journal of Algorithms ,2005.55 ( 1 ) : 58 -75.
  • 7CHRIS G. JIAWEI H. JIAN P. et al. Mining frequent patterns in data streams at multiple time granularities[ C]// Next Generation Data Mining. Cambridge, Massachusetts,2005 : 191 - 212.
  • 8刘学军,徐宏炳,董逸生,王永利,钱江波.挖掘数据流中的频繁模式[J].计算机研究与发展,2005,42(12):2192-2198. 被引量:25
  • 9敖富江,颜跃进,黄健,黄柯棣.数据流频繁模式挖掘算法设计[J].计算机科学,2008,35(3):1-5. 被引量:11
  • 10JIANG Nan, GRUENWALD LE. Research issues in data stream association rule mining[ J ]. ACM SIGMOD Record,2006,35 (1):14-19.

二级参考文献50

共引文献55

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部