摘要
针对分布式数据流应用中,如何在高速、海量的输入数据中识别重要数据单元的问题,给出了有效数据的概念,并提出了1种有效数据识别算法。该算法以缩略图技术为基础,能够在用户给定的误差范围内,以接近1的概率输出有效数据,而且占用较少内存。实验和算法分析验证了算法的有效性。
How to discover important items is one of the key technologies concerning distributed data stream applications with infinite data and high speed. To solve this problem, the concept of effective data is defined and a discovery algorithm is proposed. Based on the data sketch method, the effective data can be output with error given by the user with probability near to one and little memory is consumed. The simulation and algorithm analysis proved the efficiency of this algorithm.
出处
《中国海洋大学学报(自然科学版)》
CAS
CSCD
北大核心
2006年第6期885-888,1012,共5页
Periodical of Ocean University of China
基金
国防重大基础预研项目(S0500A001)资助
关键词
数据流
分布式数据流系统
频繁数据
有效数据
data stream
distributed data stream manage system
frequent data items
useful data items