期刊文献+

有向标记根树之间的语义编辑距离

Semantic Edit Distance between Two Directed Labeled and Rooted Trees
原文传递
导出
摘要 有向标记根树之间的编辑距离(TED)被广泛应用在文档的结构化相似度计算上.文中提出有向标记根树之间的语义编辑距离(TSED)的概念,并给出计算公式.组合TED和TSED形成距离测度,并应用在XML文档的结构聚类上.实验表明该距离模型在结构化聚类的准确率和召回率上明显优于单纯利用TED算法的聚类结果.该算法在时间复杂性上也等同于利用动态规划计算TED的最好算法. In graph theory, the tree edit distance (TED) between two directed labeled and rooted trees is a popular research issue. As a combination optimization problem, calculating TED is widely used in the detection of the structural similarity of semi-structural documents. In this paper, a concept named tree semantic edit distance (TSED) with the corresponding formula is proposed. Then a distance measure based on both TED and TSED is presented. The proposed distance is applied in clustering the document object model (DOM) trees of extensible markup language (XML) documents. Experimental results show the proposed measure is better than those used TED only in terms of clustering precision and recall. The time complexity of the proposed algorithm is the same as those of algorithms for TED based on dynamic programming.
作者 康琪 马军
出处 《模式识别与人工智能》 EI CSCD 北大核心 2011年第6期816-824,共9页 Pattern Recognition and Artificial Intelligence
基金 国家自然科学基金项目(No.60970047) 中国博士后科学基金项目(No.20100471503) 山东省自然科学基金项目(No.Y2008G19) 山东省科技攻关项目(No.2007GG10001002 2008GG10001026)资助
关键词 树编辑距离 文档聚类 结构相似度 语义相似性 Tree Edit Distance, Document Clustering, Structural Similarity, Semantic Similarity
  • 相关文献

参考文献19

  • 1Fiesca S, Manco G, Masciari E, et al. Fast Detection of XML Structural Similarity. IEEE Trans on Knowledge and Data Engineer- ing, 2005, 17(2) : 160 - 175.
  • 2Ma Jun, Yi Yingnan, Tian Tian, et al. Retrieving Digital Artifacts from Digital Libraries Semantically//Proc of the International Con- ference on Intelligent Computing. Hefei, China, 2005 : 340 - 349.
  • 3Ma Juo, Hemmje M. Knowledge Management Support for Coopera- tive Research// Proc of the 17th World Computer Congress. Mont- real, Canada, 2002 : 280 - 284.
  • 4马军,邵陆.模糊聚类计算的最佳算法[J].软件学报,2001,12(4):578-581. 被引量:25
  • 5雷景生,马军,靳婷.基于分级神经网络的Web文档模糊聚类技术[J].计算机研究与发展,2006,43(10):1695-1699. 被引量:3
  • 6马军,陈竹敏,赵嫣,雷景生.基于部分-整体匹配的文档结构相似度计算[J].模式识别与人工智能,2007,20(5):630-635. 被引量:2
  • 7Bertinoa E, Guerrinib E, Mesiti M. A Matching Algorithm for Meas-uring the Structural Similarity between an XML Document and a DOM and Its Applications. Information System, 2004, 29 ( 1 ) : 23 - 46.
  • 8Marian A. Detecting Changes in XML Documents // Proc of the 18th International Conference on Data Engineering. San Jose, USA, 2002 : 137 - 146.
  • 9Buttler B. A Short Survey of Document Structure Similarity Algo- rithms// Proc of the International Conference on Interuet Compu- ting. Las Vegas, USA, 2004 : 3 - 9.
  • 10Chen Weimin. New Algorithm for Ordered Tree-to-Tree Correction Problem. Journal of Algorithms, 2001, 40(2) : 135 - 158.

二级参考文献37

共引文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部