期刊文献+

一种XML相似重复数据的清理方法研究 被引量:7

Study on an XML approximately duplicated data cleaning method
下载PDF
导出
摘要 针对半结构化数据XML在数据清理中的重要性 ,研究了如何清理XML相似重复数据 ,主要工作有 :提出一种有效的XML相似重复数据清理方法 ,该方法具有较强的适应性 ,任何XML相似检测算法都适用于此 ;给出一种基于树编辑距离的相似检测算法 ,该算法能有效地检测XML相似重复数据 ;采用树编辑距离的上下限优化基于树编辑距离的相似检测算法 ,避免了不必要的树编辑距离计算 ,降低了相似检测计算的复杂度 ,提高了运算效率 . Aiming at the importance of semi-structured data XML in data cleaning, how to clean XML approximately duplicated data was studied. An efficient XML approximately duplicated data cleaning method was proposed. This method is adaptive, because any other approximately detecting algorithm can be used in it. An efficient approximately detecting algorithm based on tree edit distance was presented. This algorithm can detect approximately duplicated data efficiently. The lower and upper bounds of tree edit distance were used to optimize the approximately duplicated data detecting algorithm. The improved algorithm can avoid computing the tree edit distance that is unnecessary between a pair of XML data, and reduce the approximate computation complexity. So, foundations are built for researching XML approximately duplicated data cleaning.
作者 陈伟 丁秋林
出处 《北京航空航天大学学报》 EI CAS CSCD 北大核心 2004年第9期835-838,共4页 Journal of Beijing University of Aeronautics and Astronautics
关键词 规则库 算法库 数据清理 可扩展标记语言 相似重复数据 Algorithms Computational complexity Navier Stokes equations Structured programming Trees (mathematics) XML
  • 相关文献

参考文献5

  • 1[1]Rahm E, Do H H.Data cleaning:problems and current approaches[J].IEEE Data Engineer Bulletin, 2000, 23(4):3~13
  • 2[2]Galhardas H, Florescu D, Shasha D,et al .Declarative data cleaning:language,model,and algorithms[A].In:Apers P, Atzeni P,Ceri S,eds.Proceedings of the 27th VLDB Conference[C].Roma:Morgan Kaufmann, 2001.371~380
  • 3[3]Monge A E.Matching algorithms within a duplicate detection system[J].IEEE Data Engineer Bulletin, 2000,23(4):14~20
  • 4[4]Zhang K,Shasha D.Tree pattern matching[M].London:Oxford Univesity Press,1997
  • 5[5]Guha S, Jagadish H V, Koudas N, et al .Approximate XML joins[A].In:Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data[C].Madison:ACM Press,2002

同被引文献129

引证文献7

二级引证文献28

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部