基于Hamming范数的XML流相关性估测算法

Correlation Estimating Algorithm of XML Stream Based on Hamming Norms

下载PDF

导出

摘要在数据库理论中,如何在较小的空间条件下快速地比较不同的XML(extensible markup language)流的差异性是一个基本问题.在这一问题的研究中,人们提出了树编辑距离等测度来描述XML文本的差异性.提出了一种基于Hamming范数的l0测度——即XML树的不同子树的个数,并以此来刻画XML文本的相关性.在数据流模型下,给出了基于空间有界伪随机数发生器、稳态分布于哈希函数的l0测度的概率算法.理论上的时空复杂性分析、正确性证明与实验模拟结果表明,这一概率算法对问题的输入提供了一个理想的近似. It is of great importance to compare the correlation of different XML （extensible markup language） streams in the limited space in the Database Theory. In the study of these problems, several measures are proposed, e.g. the tree-edit distance, to show the difference of XML trees. This paper proposes a natural measure lo employing Hamming norms, i.e. the number of distinct sub-trees between two XML trees, to estimate the correlation. Furthermore, a probabilistic estimating algorithm involving space-bounded pseudorandom generators, stable distributions and hash functions has been presented in the data stream model. Theoretical time/space complexity analysis, correctness proof and experimental simulation show that this algorithm can give a desired approximation.

作者孙贺朱洪

机构地区复旦大学上海市智能信息处理重点实验室华东师范大学上海市高可信计算重点实验室

出处《软件学报》 EI CSCD 北大核心 2010年第4期672-679,共8页 Journal of Software

基金国家高技术研究发展计划(863)No.2007AA01Z189 上海重点学科建设项目资助No.B412~~

关键词算法设计数据流 Hamming范数稳态分布 XML(extensible MARKUP language) algorithm design data stream Hamming norm stable distribution XML （extensible markuplanguage）

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献13

1Flajolet P,Martin GN.Probabilistic counting algorithms for data base applications.Journal of Computer and System Sciences,1985,31(2):182-209.[doi:10.1016/0022-0000(85)90041-8].
2Alon N,Matias Y,Szegedy M.The space complexity of approximating the frequency moments.Journal of Computer and System Sciences,1999,58(1):137-147.[doi:10.1006/jcss.1997.1545].
3Garofalakis M,Kumar A.XML stream processing using tree-edit distance embeddings.ACM Trans.on Database Systems,2005,30(1):279-332.[doi:10.1145/1061318.1061326].
4Polyzotis N,Garofalakis M,Ioannidis Y.Approximate XML query answers.In:Weikum G,ed.Proc.of the 2004 ACM SIGMOD Int'l Conf.on Management of Data.New York:ACM Press,2004.263-274.
5Bar-Yossef Z,Jayram TS,Kumar R,Sivakumar D,Trevisan L.Counting distinct elements in a data stream.In:Rolim JDP,Vadhan S,eds.Proc.of the 6th Int'l Workshop on Randomization and Approximation Techniques in Computer Science.Berlin,Heidelberg:Springer-Verlag,2002.1-10.
6Gibbons PB,Tirthapura S.Estimating simple functions on the union of data streams.In:Rosenberg A,ed.Proc.of the Symp.on Parallel Algorithms and Architectures.New York:ACM Press,2001.281-291.
7Indyk P.Stable distributions,pseudorandom generators,embeddings and data stream computation.In:Proc.of the 41st Symp.on Foundations of Computer Science.Cambridge:IEEE Computer Society Press,2000.189-197.http://portal.acm.org/citation.cfm? id=795666.796606.
8Cormode G,Datar M,Indyk P,Muthukrishnan S.Comparing data streams using Hamming norms (how to zero in).In:Ramakrishnan R,ed.Proc.of the 28th Int'l Conf.on Very Large Data Bases.New York:VLDB Press,2002.335-345.
9Cormode G,Muthukrishnan S.Estimating dominance norms of multiple data streams.In:Battista GD,Zwick U,eds.Proc.of the 11th European Symp.on Algorithms.Berlin,Heidelberg:Springer-Verlag,2003.148-160.
10Zolotarev V.One-Dimensional Stable Distributions.In:Vol.65 of Translations of Mathematical Monographs.American Mathematical Society Press,1986.

1陈猛,楚广琳.数据流分类研究综述[J].科技信息,2012(22):239-240.
2王涛,李舟军,颜跃进,陈火旺.人工智能数据流挖掘分类技术综述[J].中国学术期刊文摘,2008,14(10):8-8.
3胡波,覃溪,龙先子.基于机器视觉的西瓜体积估测算法[J].科学时代,2013(21). 被引量：1
4非约.标注网页,轻松分享[J].网友世界,2010(22):61-61.
5沈达凡.XML：新一代Web页面标记语言[J].电子计算机,2000(3):30-32.
6张亮,李然,汪卫,施伯乐.XML数据物化模式的生成与优化技术[J].软件学报,2007,18(2):323-331. 被引量：1
7李红梅,郝文宁,陈刚.基于改进LSH的协同过滤推荐算法[J].计算机科学,2015,42(10):256-261. 被引量：13
8郑丽萍.交遇区样本分类的应用[J].山东理工大学学报（自然科学版）,2004,18(6):57-60.
9王涛,李舟军,颜跃进,陈火旺.数据流挖掘分类技术综述[J].计算机研究与发展,2007,44(11):1809-1815. 被引量：41
10陈晓磊.物联网架构下风电机组远程状态监测系统中间件设计[J].数字技术与应用,2012,30(11):4-5. 被引量：2

软件学报

2010年第4期

浏览历史

内容加载中请稍等...

基于Hamming范数的XML流相关性估测算法

参考文献13

相关作者

相关机构

相关主题

浏览历史