摘要
动态地维护数据流的概要结构是数据流查询和挖掘等处理工作的基础.在许多数据流应用场合,数据的影响随时间衰减,流中数据被逐步遗忘,称为数据流的遗忘特性.在数据流概要的构造中,应体现这种特性.离散小波变换是一种应用得较多的数据流概要构造方法.将数据流的遗忘特性引入小波概要的构造中,提出了一种能反映数据流遗忘特性的小波概要结构:基于小波的分层遗忘概要,分别讨论了误差平方和及最大绝对误差两种误差度量标准下这种概要的构造方法.所进行的实验验证了该概要的有效性.
Maintaining a synopsis data structure dynamically from data stream is vital for a variety of streaming data applications, such as approximate query or data mining. In many cases, the significance of data item in streams decays with age: this item perhaps conveys critical information first, but, as time goes by, it gets less and less important until it eventually becomes useless. This feature is termed amnesic. Discrete wavelet transform is often used in construction of synopses for streaming data. Proposed in this paper is a wavelet-based hierarchical amnesic synopsis (W-HAS), which includes the amnesic feature of data stream in the generation of wavelet synopses. W-HAS can provide a better approximate representation for data streams with amnesic feature than conventional wavelet synopses. To maintain W-HAS online for evolving data streams, the authors first explore the merger process of two wavelet decompositions, and then implement the addition of data nodes in WHAS structure based on the merger process. Using the addition of data nodes, W-HAS grows dynamically and hierarchically. The construction methods of W-HAS under sum of squared error (sse) and maximum absolute error metrics are discussed. Further, W-HAS with error control is also explore. Finally, experiments on real and synthetic datasets validated the proposed methods.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2009年第2期268-279,共12页
Journal of Computer Research and Development
基金
国家自然科学基金项目(60773072)
浙江省自然科学基金项目(Y104144)
浙江省教育厅科研项目(20051737)~~
关键词
概要结构
遗忘特性
离散小波变换
数据流
近似表示
synopses
amnesic feature
discrete wavelet transform
data stream
approximation representation