期刊文献+

PGG:An Online Pattern Based Approach for Stream Variation Management

PGG:An Online Pattern Based Approach for Stream Variation Management
原文传递
导出
摘要 Many database applications require efficient processing of data streams with value variations and fluctuant sampling frequency. The variations typically imply fundamental features of the stream and important domain knowledge of underlying objects. In some data streams, successive events seem to recur in a certain time interval, but the data indeed evolves with tiny differences as time elapses. This feature, so called pseudo periodicity, poses a new challenge to stream variation management. This study focuses on the online management for variations over such streams. The idea can be applied to many scenarios such as patient vital signal monitoring in medical applications. This paper proposes a new method named Pattern Growth Graph (PGG) to detect and manage variations over evolving streams with following features: 1) adopts the wave-pattern to capture the major information of data evolution and represent them compactly; 2) detects the variations in a single pass over the stream with the help of wave-pattern matching algorithm; 3) only stores different segments of the pattern for incoming stream, and hence substantially compresses the data without losing important information; 4) distinguishes meaningful data changes from noise and reconstructs the stream with acceptable accuracy. Extensive experiments on real datasets containing millions of data items, as well as a prototype system, are carried out to demonstrate the feasibility and effectiveness of the proposed scheme. Many database applications require efficient processing of data streams with value variations and fluctuant sampling frequency. The variations typically imply fundamental features of the stream and important domain knowledge of underlying objects. In some data streams, successive events seem to recur in a certain time interval, but the data indeed evolves with tiny differences as time elapses. This feature, so called pseudo periodicity, poses a new challenge to stream variation management. This study focuses on the online management for variations over such streams. The idea can be applied to many scenarios such as patient vital signal monitoring in medical applications. This paper proposes a new method named Pattern Growth Graph (PGG) to detect and manage variations over evolving streams with following features: 1) adopts the wave-pattern to capture the major information of data evolution and represent them compactly; 2) detects the variations in a single pass over the stream with the help of wave-pattern matching algorithm; 3) only stores different segments of the pattern for incoming stream, and hence substantially compresses the data without losing important information; 4) distinguishes meaningful data changes from noise and reconstructs the stream with acceptable accuracy. Extensive experiments on real datasets containing millions of data items, as well as a prototype system, are carried out to demonstrate the feasibility and effectiveness of the proposed scheme.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2008年第4期497-515,共19页 计算机科学技术学报(英文版)
基金 National Natural Science Foundation of China under Grant No.60673113. FUJITSU.
关键词 data stream noise reorganization pattern representation variation management data stream, noise reorganization, pattern representation, variation management
  • 相关文献

参考文献2

二级参考文献35

  • 1Garofalakis M, Gehrke J, Rastogi R. Querying and mining data streams: You only get one look. In the Tutorial Notes of the 28th International Conference on Very Large Databases, Hong Kong, China, August 2002.
  • 2Agrawal R, Srikant R. Fast algorithms for mining association rules. In Proc. the 20th Int. Conf. Very Large Databases,Santiago, Chile, September 1994, pp.487-499.
  • 3Brin S, Motwani R, Ullman J D, Tsur S. Dynamic itemset counting and implication rules for market basket data. In Proc.the A CM SIGMOD International Conference on Management of Data, Tucson, AZ, May 1997, pp.255-264.
  • 4Agarwal R C, Aggarwal C C, Prasad V V V. Depth first generation of long patterns. In Proc. the 6th A CM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, September 2000, pp.108-118.
  • 5Hidber C. Online association rule mining. In Proc. the ACM SIGMOD International Conference on Management of Data,Philadelphia, PA, May 1999, pp.145-156.
  • 6Manku G S, Motwani R. Approximate frequency counts over data streams. In Proc. the 28th Int. Conf. Very Large Databases, Hong Kong, China, August 2002, pp.346--357.
  • 7Charikar M, Chen K, Farach-Colton M. Finding frequent items in data streams. In Proc. the 29th Int. Colloquium on Automata, Language and Programming, 2002, pp.693-703.
  • 8Chang J H, Lee W S. Finding recent frequent itemsets adaptively over online data streams. In Proc.the 9th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining, Washington DC, August 2003, pp.487-492.
  • 9Guha S, Mishra N, Motwani R, O'Callaghan L. Clustering data streams. In FOCS'00: Proc. the 41st Annual Symposium on Foundations of Computer Science, Redondo Beach,CA, USA, 2000, p.359.
  • 10Moses Charikar, Liadan O'Callaghan, Rina Panigrahy. Better streaming algorithms for clustering problems. In STOC'03:Proc. the 35th Annual ACM Symposium on Theory of Computing, San Diego, CA, USA, 2003, pp.30-39.

共引文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部