期刊文献+

基于语义和结构的XML文档相似度的计算方法 被引量:3

XML Document Similarity Measure Based on Semantics and Structure
下载PDF
导出
摘要 个性化信息服务通过了解用户的兴趣爱好,为不同的用户提供不同的信息服务。XML是一种标示语言,是Web文档表示和交换的常用相关标准,因此XML文档之间相似度计算问题对于个性化推荐与信息检索非常重要,为此提出了一个计算XML文档之间的语义和结构相似度的方法 XMLSim。首先,基于节点标记对之间的语义相似度和编辑距离计算节点标记对之间的相似度;在分析了路径上节点具有的偏序关系之后,将路径之间相似度问题抽象为最大相似子序列(MSS,Maximal Similar Subsequence)问题,并利用动态规划对MSS问题求解得到路径相似度NpathSim。最后,XML文档之间的相似度XMLSim通过路径集合之间的最大NPathSim的平均值得到。 XML is a markup language that has emerged as the most relevant standardization effort for document rep- resentation and exchange on the Web. Similarity measure for XML documents plays important role in personalized recommendations and information retrieval. A novel approach to compute semantic and structural similarity between XML documents, XMLSim, is proposed in this paper. Firstly, a similarity between node tags is created based on semantic similarity and string similarity. After analyzing partial relationship among node tags, the path similarity is abstracted as Maximal Similar Subsequence (MSS) problem. The result of NPathSim is obtained by the solution of MSS with dynamic programming. Finally, XMLSim is the average of the best NPathSim value among path sets.
出处 《中文信息学报》 CSCD 北大核心 2012年第5期59-64,共6页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(61170052) 山东省高等教育学会"十二五"高等教育科学研究课题(YBKT2011063) 山东建筑大学博士基金(XNBS1028)
关键词 XML 相似度 动态规划 语义和结构 XML similarity dynamic programming semantics and structure
  • 相关文献

参考文献13

  • 1郑仕辉,周傲英,张龙.XML文档的相似测度和结构索引研究[J].计算机学报,2003,26(9):1116-1122. 被引量:28
  • 2Zhang K, Statman R, Shasha D. On the editing dis-tance between unordered labeled trees[J]. Information Processing Letters. 1992, 42(3) : 133-139.
  • 3Nierman A, Jagadish H V. Evaluating Structural Simi- larity in XML Documents[DB/OL]. 2002, citeseerx. ist. psu. edu,61-66.
  • 4Nayak R. Investigating Semantic Measures in XML Clustering[C]//Proceedings of IEEE/WIC/ACM In- ternational Conference on Web Intelligence, 2006: 1042-1045.
  • 5Joshi S, Agrawal N, Krishnapuram R, et al. A bag of paths model for measuring structural similarity in Web documents[C]//Proceedings of Knowledge Discovery and Data Mining. Washington, D. C. , ACM Press, 2003: 577-582.
  • 6Nayak R, Iryadi W. XML schema clustering with se- mantic and hierarchical similarity measures [J].Knowledge-Based Systems. 2007, 20(4) : 336-349.
  • 7赵嫣,马军,李森.一种计算结构化文档相关度的方法[c]//第二届中国分类技术及应用学术会议.郑州:20070527.350-355.
  • 8Jeong B, Lee D, Cho H, et al. A novel method for measuring semantic similarity for XML schema match- ing[J].Expert Systems with Applications. 2008, 34(3) : 1651-1658.
  • 9Levenshtein V. Binary codes capable of correcting de letions, insertions, and reversals[J]. Soviet Physics Doklady. 1966, 10(8): 707-710.
  • 10Princeton University. WordNet[DB/OL]. 2011, ht- tp ://wordnet. princeton, edu/.

二级参考文献15

  • 1XQuery: A query language for XML. W3C Working Draft 15February 2001, available: http://www. w3. org/TR/xquery/.
  • 2Tarjan. Three partition refinement algorithms. SIAM Journalon Computing, 1987, 16(6): 973-989.
  • 3Henzinger M R, Henzinger T A, Kopke P W. Computing sim-ulations on finite and infinite graphs. In: Proceedings of the36th Annual IEEE Symposium on Foundations of ComputerScience, Milwaukee, Wisconsin, 1995. 453-462.
  • 4Marian A, Abiteboul S, Cobena G, Mignet L. Change-centricmanagement of versions in an XML warehouse. In: Proceed-ings of the 27th International Conference on Very Large DataBases, Roma, Italy,2001. 581-590.
  • 5Goldman R, Widom J. Summarizing and searching sequential semistructured sources. Stanford University: Technical ReportTR20000312, 2000.
  • 6Zheng Shi-Hui, Zhou Ao-Ying et al. Structure-based approximate searching in XML data. Fudan University: Technical Report TR20010203,2001.
  • 7Wang J T-L, Shasha D etal. Structural matching and discovery in document databases. Sigmod Record, 1997, 26(2): 560-564.
  • 8Zhang K. A constrained editing distance between unordered labeled trees. Journal of Algorithmica, 1996, 15(3): 205-222.
  • 9Zhang K, Shasha D. On the editing distance between unordered labeled trees. Information Processing Letters, 1992, 42(3): 133-139.
  • 10Wang J T-L, Zhang K etal. Exact and approximate algorithmsfor unordered tree matching. IEEE Transactions on Systems,Man and Cybernetics, 1994, 24(4): 668-678.

共引文献27

同被引文献25

  • 1Lenzerini M.Data integration:a theoretical perspective[C]//PODS.New York,USA,2002:233-246.
  • 2Kolaitis P G.Schema mappings,data exchange,and metadata rnanagement[C]//pODS.New York,USA,2005:61-75.
  • 3BemsteinP A,Melnik S.Model management 2.0:manipulating richer mappings[C]//SIGMOD.New York,USA,2007:1-12.
  • 4Barceló P.Logical foundations of relational data exchange[J].SIGMOD Rec.,2009,38:49-58.
  • 5Fagin R,Kolaitis P G,Popa L.Data exchange:getting to the core[J].ACM Trans.Database Syst.,2005,30:174-210.
  • 6Gottlob G,Nash A.Data exchange:computing cores in polynomial time[C]//PODS.New York,USA,2006:40-49.
  • 7Libkin L,Sirangelo C.Data exchange and schema mappings in open and closed worlds[J].Journal of Computer and System Sciences(In Press,Corrected Proof),2010.
  • 8Fagin R,Kimelfeld B,Kolaitis P G.Probabilistic data exchange[C]// ICDT.New York,USA,2010:76-88.
  • 9Arenas M,Libkin L.XML data exchange:Consistency and query answering[J].J.ACM,2008,55:1-72.
  • 10Amano S,Libkin L,Murlak F.XML schema mappings[C]//PODS.New York,USA,2009:33-42.

引证文献3

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部