期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Min-wise hash function-based sampling over distributed data streams
1
作者 崇志宏 倪巍伟 +2 位作者 徐立臻 吕建华 谢英豪 《Journal of Southeast University(English Edition)》 EI CAS 2009年第4期456-459,共4页
In order to avoid the redundant and inconsistent information in distributed data streams, a sampling method based on min-wise hash functions is designed and the practical semantics of the union of distributed data str... In order to avoid the redundant and inconsistent information in distributed data streams, a sampling method based on min-wise hash functions is designed and the practical semantics of the union of distributed data streams is defined. First, for each family of min-wise hash functions, the data with the minimum hash value are selected as local samples and the biased effect caused by frequent updates in a single data stream is filtered out. Secondly, for the same hash function, the sample with the minimum hash value is selected as the global sample and the local samples are combined at the center node to filter out the biased effect of duplicated updates. Finally, based on the obtained uniform samples, several aggregations on the defined semantics of the union of data streams are precisely estimated. The results of comparison tests on synthetic and real-life data streams demonstrate the effectiveness of this method. 展开更多
关键词 data streams AGGREGATION rain-wise hashing
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部