期刊文献+

数据流中一种快速启发式频繁模式挖掘方法 被引量:14

A High-Speed Heuristic Algorithm for Mining Frequent Patterns in Data Stream
下载PDF
导出
摘要 在现有的数据流频繁模式挖掘算法中,批处理方法平均处理时间短,但需要积攒足够的数据,使得其实时性差且查询粒度粗;而启发式方法可以直接处理数据流,但处理速度慢.提出一种改进的字典树结构——IL-TREE(improvedlexicographictree),并在其基础上提出一种新的启发式算法FPIL-Stream(frequentpatternminingbasedonimprovedlexicographictree),在更新模式和生成新模式的过程中,可以快速定位历史模式.算法结合了倾斜窗口策略,可以详细记录历史信息.该算法在及时处理数据流的前提下,也降低了数据的平均处理时间,并且提供了更细的查询粒度. Of the current approaches to frequent pattern discovery in stream data, the batch approach requires enough data, while the heuristic approach can deal with stream data directly. Although the average speed of the batch approach is higher, it cannot response on time and the query granularity is rough. This paper proposes an improved Lexicographic tree, IL-TREE (improved lexicographic tree), and gives a novel heuristic algorithm, called FPIL-Stream (frequent pattern mining based on improved lexicographic tree), which locates the historical patterns rapidly in the stage of updating the patterns and generating the new ones. Moreover, a policy for the titled window is integrated into the algorithm for recording the historical information in details. With the promise of the processing stream data on time, the algorithm reduce the average processing time greatly and provides a finer granularity of query.
出处 《软件学报》 EI CSCD 北大核心 2005年第12期2099-2105,共7页 Journal of Software
基金 国家自然科学基金~~
关键词 数据挖掘 数据流 频繁模式 倾斜窗口 data mining data stream frequent pattern tilted window
  • 相关文献

参考文献7

  • 1Giannella C, Han JW, Pei J, Yan XF, Yu PS. Mining frequent patterns in data streams at multiple time granularities.http://maids.ncsa.uiuc.edu/documents/readings/fpstm03.pdf
  • 2Manku GS, Motwani R. Approximate frequency counts over data streams. In: Bernstein P, Ioannidis Y, Ramakrishnan R, eds. Proc.of the 28th Int'l Conf. on Very Large Data Bases. Hong Kong: Morgan Kaufmann Publishers, 2002. 346-357.
  • 3Hidber C. Online association rule mining. In: Delis A, Faloutsos C, Ghandeharizadeh S, eds. Proc. of the ACM SIGMOD Int'l Conf.on Management of Data (SIGMOD 1999). Philadelphia: ACM Press, 1999. 145-156.
  • 4Chang J, Lee W. Finding recent frequent itemsets adaptively over online data streams. In: Lise G, Ted E. S, Pedro D, Christos F,eds. Proc. of the 9th ACM SIGKDD Int'l Conf. on Knowledge Discovery & Data Mining. Washington: ACM Press, 2003.226-235.
  • 5Agrawal R, Srikant R. Fast algorithms for mining association rules. In: Beeri C, et al., eds. Proc. of the 20th Int'l Conf. on Very Large Databases. Santiago: Morgan Kaufmann Publishers, 1994. 487-499.
  • 6Agarwal RC, Aggarwal CC, Prasad VVV. A tree projection algorithm for finding frequent itemsets. Journal on Parallel and Distributed Computing, 2001,61(3):350-371.
  • 7.[EB/OL].http://www.almaden.ibm.com/so ftware/quest/Resources/index. shtml,.

同被引文献210

引证文献14

二级引证文献92

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部