期刊文献+

基于Hamming范数的XML流相关性估测算法

Correlation Estimating Algorithm of XML Stream Based on Hamming Norms
下载PDF
导出
摘要 在数据库理论中,如何在较小的空间条件下快速地比较不同的XML(extensible markup language)流的差异性是一个基本问题.在这一问题的研究中,人们提出了树编辑距离等测度来描述XML文本的差异性.提出了一种基于Hamming范数的l0测度——即XML树的不同子树的个数,并以此来刻画XML文本的相关性.在数据流模型下,给出了基于空间有界伪随机数发生器、稳态分布于哈希函数的l0测度的概率算法.理论上的时空复杂性分析、正确性证明与实验模拟结果表明,这一概率算法对问题的输入提供了一个理想的近似. It is of great importance to compare the correlation of different XML (extensible markup language) streams in the limited space in the Database Theory. In the study of these problems, several measures are proposed, e.g. the tree-edit distance, to show the difference of XML trees. This paper proposes a natural measure lo employing Hamming norms, i.e. the number of distinct sub-trees between two XML trees, to estimate the correlation. Furthermore, a probabilistic estimating algorithm involving space-bounded pseudorandom generators, stable distributions and hash functions has been presented in the data stream model. Theoretical time/space complexity analysis, correctness proof and experimental simulation show that this algorithm can give a desired approximation.
作者 孙贺 朱洪
出处 《软件学报》 EI CSCD 北大核心 2010年第4期672-679,共8页 Journal of Software
基金 国家高技术研究发展计划(863)No.2007AA01Z189 上海重点学科建设项目资助No.B412~~
关键词 算法设计 数据流 Hamming范数 稳态分布 XML(extensible MARKUP language) algorithm design data stream Hamming norm stable distribution XML (extensible markuplanguage)
  • 相关文献

参考文献13

  • 1Flajolet P,Martin GN.Probabilistic counting algorithms for data base applications.Journal of Computer and System Sciences,1985,31(2):182-209.[doi:10.1016/0022-0000(85)90041-8].
  • 2Alon N,Matias Y,Szegedy M.The space complexity of approximating the frequency moments.Journal of Computer and System Sciences,1999,58(1):137-147.[doi:10.1006/jcss.1997.1545].
  • 3Garofalakis M,Kumar A.XML stream processing using tree-edit distance embeddings.ACM Trans.on Database Systems,2005,30(1):279-332.[doi:10.1145/1061318.1061326].
  • 4Polyzotis N,Garofalakis M,Ioannidis Y.Approximate XML query answers.In:Weikum G,ed.Proc.of the 2004 ACM SIGMOD Int'l Conf.on Management of Data.New York:ACM Press,2004.263-274.
  • 5Bar-Yossef Z,Jayram TS,Kumar R,Sivakumar D,Trevisan L.Counting distinct elements in a data stream.In:Rolim JDP,Vadhan S,eds.Proc.of the 6th Int'l Workshop on Randomization and Approximation Techniques in Computer Science.Berlin,Heidelberg:Springer-Verlag,2002.1-10.
  • 6Gibbons PB,Tirthapura S.Estimating simple functions on the union of data streams.In:Rosenberg A,ed.Proc.of the Symp.on Parallel Algorithms and Architectures.New York:ACM Press,2001.281-291.
  • 7Indyk P.Stable distributions,pseudorandom generators,embeddings and data stream computation.In:Proc.of the 41st Symp.on Foundations of Computer Science.Cambridge:IEEE Computer Society Press,2000.189-197.http://portal.acm.org/citation.cfm? id=795666.796606.
  • 8Cormode G,Datar M,Indyk P,Muthukrishnan S.Comparing data streams using Hamming norms (how to zero in).In:Ramakrishnan R,ed.Proc.of the 28th Int'l Conf.on Very Large Data Bases.New York:VLDB Press,2002.335-345.
  • 9Cormode G,Muthukrishnan S.Estimating dominance norms of multiple data streams.In:Battista GD,Zwick U,eds.Proc.of the 11th European Symp.on Algorithms.Berlin,Heidelberg:Springer-Verlag,2003.148-160.
  • 10Zolotarev V.One-Dimensional Stable Distributions.In:Vol.65 of Translations of Mathematical Monographs.American Mathematical Society Press,1986.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部